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--> works.  While not strictly "trade secret" information, care should <--
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History in brief:

1998/01/07  Added IV.P and made numerous small changes.
1997/11/13  Added some new sections and many corrections.  General release.
1997/11/03  Several corrections and new additions.  First public draft.
1997/10/30  Complete rewrite; renamed to "Greater Scroll".
1997/08/11  Touched up a bit for Microsoft folks.
1996/11/19  Assorted notes on interactions between dialing options, "visible
            dialing", black holes, and Spooky dial Options.  [ This is the
            version most people are familiar with. ]
1996/11/17  Load balancing makes its debut.
1996/11/11  We now have flat-rate IAPs and associated nastiness.
1996/09/01  Did something.
1996/08/30  Added comment about handling "1-800 addicts".
1996/08/29  Clarified access number handling.
1996/08/27  First draft of the "Great Scroll".
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  |                          |
-=*=-  I. System Overview  -=*=-
  |                          |


-= I.A =-  Whassup with this

The WebTV system is a combination of a set-top box and an online service.
The set-top box ("WebTV Internet Terminal" or "WebTV Plus Receiver";
henceforth just "box"), is connected to a television and a phone line.
Once it has successfully dialed into an Internet Service Provider, it
connects to the WebTV service, and great things happen.

The simple act of getting a user connected to a local ISP is surprisingly
difficult.  This document explains the fundamentals of getting connected.
The intended audience is Customer Care, SOC, Network Operations, QA, and
Engineering.  Not all sections are relevant for everyone.


The focus of this document is on the U.S. phone system.  International
issues, including a description of the Japanese phone system, can be found
in the "IntlPhoneNotes" document.

Sections I, II, and III should be generally useful.  Sections IV and V are
more technical and aren't important for everyone to understand.  I've
chosen not to include the internal workings of tellyscript and PhoneDB
generation in this document, because they're complicated, volatile, and
really only necessary for engineering and a few people in SOC and netops.

Recommendations on Customer Care practices (notably with regard to dial
overrides) are simply that: recommendations.  They may or may not be
consistent with current Customer Care policies.



-= I.B =-  Joe gets wired

A brief example should help illustrate the major components of the WebTV
system.

When Joe User brings his box home from the store, the first thing he does
is try to set it up, usually without reading the instructions.  Sometimes
this even works.  When the box is powered on, it listens for a dial tone on
the phone line.  (You can turn this off in the dialing options; if you do,
it wants for a few seconds and then dials blindly.)  If the phone line
hasn't been plugged in or isn't hooked up correctly, the box will complain
that it can't hear a dial tone, and offer to try again or let you tweak the
dialing options.

Joe gets everything wired up and tries again.  This time the box hears a
dial tone, so it dials a toll-free 800 number.  This number is usually
referred to as the "scriptlessd number" or (for historical reasons that
we're hoping to obliterate) the "prereg number".  Most users can connect to
this without trouble once their box is set up properly.

Once connected, Joe's box starts talking to the scriptlessd server.
scriptlessd gets the caller's phone number via a feature called ANI
(Automatic Number Identification) that is similar to CallerID, except that
it works from almost everywhere and can't be blocked.  If the service is
unable to get the user's number from ANI, scriptlessd will put up a screen
asking the user to enter their phone number.

From the user's ANI we know where they are and what the closest POPs are
(POP is Point Of Presence, typically a bank of modems connected to an
Internet Service Provider, or ISP).  The two (someday, three or more) best
POPs are assigned to that user, and put into a set of dialing instructions
called a "tellyscript" (a bad pun on a product from General Magic).  The
tellyscript tells the box which numbers to dial and how to dial them.

After getting the tellyscript, the box hangs up and dials the first POP in
the list, which is hopefully a local call.  If the first number is busy it
will hang up and try another.  After trying each of the POPs twice it will
give up and call a toll-free 800 "fallback" number, the use of which may be
restricted to a few hours per month.  Barring network outages or excessive
local congestion, most users shouldn't need to use the fallback number.

With a little luck, most users will successfully connect to the WebTV
Network without further intervention.  The box connects to the "headwaiter"
server, which tells it where to go.  Shortly after connecting the box sends
up a "phone log" (sometimes called a "connection log") that shows what
numbers were dialed, what failed, and what ultimately succeeded.  These
logs are used to generate POP failure statistics and to debug problems.

When Joe turns his box off with the keyboard or remote, the tellyscript is
saved in NVRAM (Non-Volatile Random Access Memory, which isn't what we're
actually using but it works the same).  The next time the box is powered
on, it skips the scriptlessd step and dials directly into the local POP.



-= I.C =-  Recap

The "box" is the thing what sits on your television set.  The "service" is
what it talks to when it gets dialed in.  The service is composed of
multiple "servers" that do specific things, like hand out "tellyscripts",
show you the home page, or let you read mail.  The box knows how to find
"scriptlessd", scriptlessd knows how to find the "headwaiter", and
headwaiterd knows how to find all the other servers.

ANI tells us the caller's phone number.  From the ANI data we assign local
POPs to the user.  The specific dialing instructions are contained in a
tellyscript.

Boxes with tellyscripts dial into their local POP, and connect to the
headwaiter.  Boxes without tellyscripts go to "scriptlessd" first to get a
tellyscript, and then hang up and redial a local POP.



  |                               |
-=*=-  II. Operation in Detail  -=*=-
  |                               |


-= II.A =-  Fancy words and TLAs

While the evolution of the US phone system shows a great deal of careful
and occasionally ingenious thought, there are some things about it that
just plain suck.  Before we can go into detail, there are a few terms and
proper nouns that should be defined.

"telco" means "telephone company".  It's a generic term, as in "this telco
billing stuff drives me nuts."  Telco guys like to say telco a lot.  Telco.

"CCMI" is Center for Communications Management Information.  CCMI sells us
a database of call pricing information that is frequently accurate.

"POP" is Point Of Presence, generally a bank of modems with a terminal
server that connects to a network.  The modems are usually part of a "hunt
group", so that you can dial just one number, and if the first line is busy
it "hunts" for the next free one.  When we try to dial a POP, but get a
person instead, we refer to it as dialing a MOM.

You sign up for a "calling plan" when you get telephone service hooked up.
In the Bay Area you usually just choose between flat-rate and measured-rate
service, but in other places you have a wide range of choices.  For
example, by adding a higher monthly charge to your phone bill you could get
flat-rate local calling to a larger area.

A "dial pattern" tells you how many digits to dial when you're calling a
particular number.  In the US, these may be 7, 10, or 11 digits long.  In
the Bay Area you can usually call yourself with a short number like
614-5539 or a long one like 1-650-614-5539, but in other areas the systems
are less lenient.  In some cases it can be more expensive to dial 11 digits
than 7.  The customer's choice of calling plan can affect the dial patterns
that they have to use.

"Tellyscripts" are a WebTV creation; they're programs sent to the box by
the service.  They contain instructions that tell the box how to configure
the modem, which POPs to call, and how it should dial them.

"LEC" is Local Exchange Carrier.  These are the guys who handle local calls
and "local toll".  Pacific Bell is our LEC.  CLECs are Competitive Local
Exchange Carriers, a new kind of carrier made possible by the 1996
Telecommunications Act.  You too can run your own phone company.

"RBOC" is Regional Bell Operating Company.  These are the "baby Bells" that
got spun out of AT&T several years ago.  Pacific Bell is an RBOC.  Sometimes
these are just referred to as "BOC"s.

"IOC" and "UOC" are CCMI abbreviations for Independent Operating Company
and Unknown Operating Company.  Contrast with BOC.  IOCs tend to be smaller
phone companies or CLECs, UOCs are usually phone companies run by rural
cooperatives or out of somebody's garage.  An IOC that CCMI doesn't know
anything about is a UOC.

"IXC" (sometimes "IEC") is Inter-eXchange Carrier.  This is a fancy term
for "long distance company" that telco people like to throw around.  AT&T
is an IXC.  When you make a long-distance call, the IXC pays money to the
LEC where the call came from and the LEC where the call went to, so calls
that avoid IXCs tend to be cheaper.

"LATA" is Local Access Transport Area, a geographical region defined by the
phone companies.  The way things traditionally worked is that your LEC
handles local calls and intra-LATA (in the same LATA) toll calls, while
your IXC handles inter-LATA (between LATA) toll calls.  So a toll call to a
location 20 miles north might be handled by Pacific Bell, while a similar
call in the other direction might be handled by AT&T, based on where the
LATA boundaries fall.  Calls that cross state boundaries follow an even
more mysterious set of rules.

The "Telecommunications Act of 1996" really screwed everything up.  Your
IXCs can be LECs, CLECs can provide local service with the LEC's equipment,
and generally anybody can do anything.  This is why MCI can offer local
service now.

"PIC" is Primary Interexchange Carrier.  This term can be used both as a
verb and an adjective.  Your phone line can be "PICed" to use a specific
carrier for your IXC, and more recently you can have an intra-LATA PIC done
for local toll calls.  A "PIC code" is a sequence of digits that you can
enter before dialing a number to choose a different carrier; examples are
10288 (1-0-ATT) or 10321 (Telecom*USA's 10-3-2-1 program).  "PIC charges"
are the fees that your IXC pays to your LEC when you change your long
distance company.  The PIC code format is in the process of changing from
10XXX to 101XXXX.

"Tariffs" tell you how much a call between two points costs.  For long
distance calls, the tariffs from the LECs on both ends and the relevant IXC
all have to be factored in.

"PUC" is Public Utilities Commission.  The PUC in each state has a great
deal of control over the tariffs that the phone companies use.  There are
places where a long-distance call handled by AT&T is completely free,
because the PUC decided that it should be.

"Local calls", in the telco world, are not necessarily free calls.  The
difference between local and toll is defined by the tariffs, which are
filed by the phone companies and monitored by the PUCs.  Pacific Bell
defines "zone 3" calls, which charge per-minute rates even to subscribers
with flat-rate plans, as local.  In the WebTV world we try to define
"Local" as least-cost and "Expensive Local" (ExpLocal) as any local call
that is more expensive than the minimum.  We calculate the minimum by
figuring out what it would cost for the customer to call himself.  Any
local call that costs more is labeled ExpLocal.

The "rate center" is a geographic point used for billing purposes.  "MTS"
(Message Toll Service) coordinates are based on the rate center.  The cost
of a long distance call is based on "major MTS coordinates" for calls over
40 miles, and "minor MTS coordinates" for calls under 40 miles.  For local
calls the "wire center" coordinates are used.  Yes, it could be more
complicated: the coordinates are specified in "V&H" (Vertical and
Horizontal) units, 1670 feet each.

"POTS" is Plain Old Telephone Service.  The term is used to differentiate
standard phone service from things like ISDN or cellular.

"C.O." is Central Office.  In the typical house or apartment, a pair of
copper wires runs from your telephone to the central office.  The distance
between your phone (or, more importantly, your WebTV box) and the central
office, and how well the wires are shielded, can affect the quality of your
phone connection and hence your modem connect rate.

"NPA/NXX" is the obfuscated term for area code and prefix.  If your phone
number is 650-614-5539, your NPA is 650 and your NXX is 614.  The NPA and
NXX are enough to identify where the call is coming from.  The last four
digits of the phone number are sometimes called the "subscriber number".
In some contexts the term "exchange" is synonymous with NPA/NXX.

An "Exchange Area" is a collection of NPA/NXXs for which the billing is
identical.  For example, two calls from anywhere in Palo Alto will have the
same cost so long as both callers have the same calling plan and service
providers.  Exchange areas may include dozens of NPA/NXXs or might only
have one.  They might overlap geographically (because of paging/cellular
exchanges), but each NPA/NXX is part of only one exchange area.

"LCA" is short for Local Calling Area.  The LCA for Palo Alto is the set of
exchange areas that are local calls from the Palo Alto exchange area.  Put
more simply, if you're a local call for me, then you're in my LCA.  LCAs
may overlap.  LCAs aren't necessarily symmetric; just because you are a
local call for me doesn't mean that I am a local call for you.

"NANP" is the North American Numbering Plan.  The Plan defines all the area
codes, how dialing patterns will work in the future, and other dry
subjects.  It's NANP rather than USNP because it applies to Canada, Guam,
and places out in the Caribbean, all of which are part of North America if
you lean back and squint.  It does not cover Mexico.

"ISP" and "IAP" are Internet Service Provider and Internet Access
Provider.  They are essentially the same thing, with a subtle and
unimportant difference.  We usually refer to them as IAPs.  Concentric
Networks Corp. (cnc), PSINet, Inc. (psi), and UUNET Technologies, Inc.
(uunet) are examples of IAPs.

The "backhoe" is a large piece of construction equipment used for digging
trenches and cutting through network cables at inopportune moments.

The "PhoneDB" is a WebTV creation that combines the CCMI data with a list
of POPs from several IAPs, and comes up with POP assignments for every
NPA/NXX.  (If you understand what I just said, you're ready to graduate.)
The POP-O-Rama web page lets you do queries on current and past PhoneDBs.



-= II.B =-  Dial patterns

People who grew up in California were spoiled by Pacific Bell's coherent
dialing pattern system.  For the most part, you can dial to any point
within the same area code by entering a 7-digit number, and you get to
numbers in other area codes by entering an 11-digit number.  Dialing
numbers in the same area code using an 11-digit number is allowed.

Other parts of the country aren't as straightforward.  There are actually
four kinds of calls you can make:

  HL - Home area code, Local call.  Calls within Mountain View are HL.
  HT - Home area code, Toll call.  Sunnyvale (408) calling Santa Cruz (408).
  FL - Foreign area code, Local call.  Mountain View (650) to Sunnyvale (408).
  FT - Foreign area code, Toll call.  Mountain View to New York.

Each of the four types can have a different "expected" dialing pattern, as
well as a "permitted" dialing pattern.  Certain combinations have
unpleasant consequences.

HL is almost always 7 digits, but some places (like Maryland) require
10-digit dialing for *all* local calls.  Yes, you have to include the area
code to call your neighbor down the street.  Enlightened areas like
California have 11-digit dialing as a "permitted" HL pattern.

HT is generally 7, 11, or both.  Places that require 7-digit dialing for
home/local calls and require 11-digit dialing for home/toll calls are
troublesome, because the number of digits depends on whether the
destination is a local call, and the definition of "local" depends on your
calling plan.  In many cases there is no way for WebTV to know ahead of
time how many digits the box should dial.  Guessing wrong results in a
recording from the phone company.

FL is usually 10 or 11, but in some cases is 7.  In nasty cases it's 7 and
10/11 aren't allowed at all.  It's nasty because we are *required* to dial
a 7-digit number into a different area code when the call is local, but
would be dialing an 11-digit number if the call were toll.  So if we think
something is local when it really isn't, we could be dialing a 7-digit
number in the *caller's* area code rather than the *callee's* area code, and
the WebTV box will be waking up somebody's grandmother.  The service takes
great pains to avoid this situation.

FT is always 11, no exceptions.


Using the right pattern can be important.  For example, there are places
where you are either not allowed to dial 11-digit numbers for local calls,
or are charged more than you would for dialing 7 (presumably because the
call is routed through the IXC as soon as the leading '1' is seen, instead
of being handled by the LEC).

The CCMI database has "hints" on dialing patterns, but they are sometimes
inaccurate.  Because the dialing pattern depends on whether a call is local
or toll, it depends on what your calling plan defines as being local.  This
makes it a bit of a challenge to get the dial pattern right.  To work
around these issues, the WebTV service takes the best guess it can, and
remembers the cases that succeed.

The service remembers a set of dialing patterns that looks like this
(output is from "dpedit", the Dial Pattern EDITor):

 The dial patterns for '01fad82501b002ba' (ANI=004154631671) are:
  S  # ANI          POP          Mode
  +  0 415-614-5539 415-233-0570 7-digit
  +  1 415-614-5539 415-322-0489 11-digit
  +  2 415-463-1671 415-233-0570 7-digit
  I  3 415-463-1671 415-666-9999 7-digit
  +  4 415-463-1660 415-322-0489 11-digit
  +  5 415-463-1660 415-233-0570 7-digit
  N  6 415-463-1660 510-742-0207 11-digit
  -  7 <empty>

Each line is one entry in the dial pattern table.  It has the person's ANI
at the time the call was placed, the POP number that the person was
calling, and how many digits were used to dial it.  We have to record the
ANI, because if they move the box to a different place, or even to a
different phone line with a different calling plan, the dial patterns can
be different.  Same story for area code splits (see next section).

When a user first signs up, or first appears at a new number, we have no
information about a person's dial patterns.  The tellyscript that gets sent
down will first try one pattern, then if that fails, it will try the next.
When one succeeds, we add an entry to the table.

Suppose the tellyscript for Palo Alto first tries 7-digit dialing and then
tries 11-digit dialing.  What happens if the POP happens to be busy on the
first attempt, but succeeds on the second?  We will end up recording a
success with 11-digit dialing, and will use that from then on.  This isn't
perfect, but it's hard to tell the difference between different kinds of
failures ("all circuits are busy" sounds just like "you don't need to dial
a 1 in front of that" to the modem).  Most of the time it works.

A problem that occasionally surfaces is with customers who turn "audible
dialing" on and get excited when the first attempt fails.  If they were to
wait for a minute or two until the box timed out and tried the next number,
everything would work out fine; but instead they hear the first attempt
fail and immediately call Customer Care.  The solution is NOT a dial
override, but rather to encourage the customer to have more patience.  (In
one case the user was told to use the 32768 secret code, which clears out
all of the settings in NVRAM.  This turned off audible dialing.  The
customer successfully dialed in shortly thereafter.)

It is also possible for a customer's dialing patterns to change over time,
perhaps because they change local calling plans.  This is not handled
automatically, because the service can't easily distinguish a dead POP from
a bad pattern.  Once again, the solution is NOT a dial override.  The
"dpedit" utility can be used to adjust the dial patterns.  Once changed,
send the user through the "new number" routine so they go back through
scriptlessd and get a script with the updated data.

See the dpedit "README" file for details on using it.


Sometimes there are exceptions to dial pattern rules within a certain
area.  For example, there was an InternetMCI POP at 415-482-2900 in Redwood
City that was a local call from Palo Alto.  Every other call to Redwood
City could be dialed with 7 or 11 digits, but not that one.  If you didn't
use 7-digit dialing, you got a recording chastising you for being so
clueless.  The moral of the story is that there's no way to know for sure
what will work until it's tried.

Things can get pretty weird.  In the 608-326 exchange in Wisconsin, if you
call "873-xxxx", you get a local number in Iowa at 1-319-873-xxxx.  If, on
the other hand, you dial 1-608-873-xxxx, you make a toll call to another
point in Wisconsin.  Even though you're in the 608 area code, and there's a
608-873-xxxx, your call to "873-xxxx" goes to a different area code.  In
this particular case, we're allowed to dial 1-319-873-xxxx, so by using
11-digit dialing there's no ambiguity.


One other note: the list of dial patterns only determines whether the box
dials 7, 10, or 11 digits when calling a POP.  It does *not* decide which
POP a customer will get, or in what order they will be tried.



-= II.C =-  Area code splits

Area code splits come in two varieties, geographical splits and overlays.
Geographical splits are done like the 415/510 and 415/650 splits, where
a geographic region gets a different area code.  With overlays, the same
area gets two area codes.  Usually one area code is used for voice, while
the other is used for FAX machines, pagers, and cellular phones.

For both kinds of splits, the transition is done over a period of a few
months.  The following chart illustrates the process, assuming that
somebody in San Francisco at 415-659-0610 and somebody in Palo Alto at
415-614-5539 (changing to 650-614-5539) are trying to call each other.

(1) Pre-split.  The 650 area code does not exist yet.

  From S.F., dialing 614-5539 works.
  From S.F., dialing 1-415-614-5539 works.
  From S.F., dialing 1-650-614-5539 results in a "what the hell area code
    is that?" message.

  From P.A., dialing 659-0610 works.
  From P.A., dialing 1-415-659-0610 works.

  The ANI for the person in Palo Alto is 415-614-5539.

(2) "Permissive" dialing.  You are allowed, but not required, to dial 650.

  From S.F., dialing 614-5539 works.
  From S.F., dialing 1-415-614-5539 works.
  From S.F., dialing 1-650-614-5539 works.

  From P.A., dialing 659-0610 works.
  From P.A., dialing 1-415-659-0610 works.

  The ANI for the called person is now 650-614-5539.  (Sometimes the local
    phone companies blow this, and do it early or late.  It's unwise to
    assume that the ANI will change at the very start of the permissive
    period.)

(3) "Mandatory" dialing (usually starts about 6 months after "permissive").

  From S.F., dialing 614-5539 gets a "you need to dial 650" recording.
  From S.F., dialing 1-415-614-5539 gets a "you need to dial 650" recording.
  From S.F., dialing 1-650-614-5539 works.

  From P.A., dialing 659-0610 gets a "you need to dial 415" recording.
  From P.A., dialing 1-415-659-0610 works.

(4) Eventually the no-longer-used numbers get reassigned.

  From S.F., dialing 614-5539 gets a wrong number.
  From S.F., dialing 1-415-614-5539 gets a wrong number.
  From S.F., dialing 1-650-614-5539 works.

  From P.A., dialing 659-0610 gets a wrong number.
  From P.A., dialing 1-415-659-0610 works.


What makes area code splits especially frustrating for us is that the dial
pattern can change.  Before the split, if you were in Palo Alto and calling
a San Francisco POP at 415-659-0610, you could just dial 659-0610.  After
the split, you would be calling a number in a different area code, and
would be required to dial 1-415-659-0610.  Even though you haven't moved,
your ANI has changed out from under you.  The WebTV service can't fix you
if you can't log in, and guess what, you can't log in except through the
800 number.

The good news is that if you make your box go back through scriptlessd, it
will detect that your ANI has changed, and all of your old dial patterns
will be ignored because they were tied to your old ANI.  Ideally we
wouldn't have to put the users through this manual step, and would either
send them back through scriptlessd automatically or just make the change to
their area code directly.  But how do we do this?

One solution here is to have an 800 fallback number that also gets your
ANI, and compare the current ANI with the ANI on record.  If all of your
local POPs are failing because we're using the wrong dial pattern, you end
up on the fallback number, and once there we can automatically detect that
it's because your area code changed.  Also, given sufficiently detailed
information about area code splits, we could program the box to dial a
different set of numbers depending on whether "today" is pre-split or
post-split.  The latter solution isn't perfect, because if the box loses
power it forgets what day it is, but it's a little cleaner.

You might be tempted to think that dialing the full 11-digit number every
time would solve this problem.  In the San Francisco/Palo Alto example
above, the 11-digit pattern worked correctly in every case.  Unfortunately,
as mentioned in the section on dial patterns, 11-digit calls might either
be disallowed or might be more expensive than a 7-digit call to the same
number.

A particularly troublesome area code split happened in Maryland in the
middle of 1997.  Not only did the area code split, but all local calls
suddenly had to be dialed with 10-digit numbers.  This change required that
the service "forget" all 7-digit patterns for callers whose ANI showed them
to be in Maryland.  The service config option IgnoreDialPattern was added
to deal with changes like this in the future.



-= II.D =-  Semi-automatic number identification

When we get the caller's phone number via ANI on the 800 scriptlessd
number, we get a little more data with it.  A typical ANI string looks like
"006506145539".  The last 10 digits are the phone number.  The first two
are the OLS (Originating Line Screening) code.  This allows us to tell if
somebody is calling in from a prison, hotel room, or pay phone rather than
a standard phone line.

At least, it would, if we were able to get at the OLS code with our
systems, which we can't.  But I digress.

If you're calling in from a point in the United States, Canada, or affiliated
areas like Puerto Rico, chances are the ANI number is valid.  There are
specific regions that don't support ANI, however, and there are times when
the ANI just doesn't seem to want to show up.

In cases like these, the service will ask the user to enter their own phone
number.  It doesn't need to be exactly right; it just needs to be in the
same "exchange area" as the box.  If the person has two phone lines, and
puts in the voice number, it will usually work just fine.  If the service
for the lines are provided by different local phone companies, though, the
billing can be quite different, so the system works best when the number
comes from ANI.

To make it easier to diagnose cases where the user entered the wrong value
for their phone number, the service labels "manual ANI" entries by
replacing the OLS code with a WebTV-defined value.  Some interesting values:

  99 (+ 10 digits) - number was entered on "enter your phone number" screen.
  98 (+0000000000) - special code used; probably an international demo box.
  97 (+0000000000) - special code used; probably an international demo box.
  96 (+ 10 digits) - number changed with dpedit or clientpopedit.
  95 (+0000000000) - service is ignoring ANI values (never on production!)

If somebody is dialing a totally inappropriate set of POPs, and their ANI
number starts with "99", chances are they entered the wrong number on the
"enter your phone number" screen.  WebTV isn't responsible for toll charges
incurred by sticky-fingered users, but diagnosing this quickly will leave
the customer happier.  Sometimes you need to check the "ANI history" to
see if they blew it at some point in the past.


What happens if we successfully get the user's ANI but can't recognize the
number?  This happens when new exchanges are added faster than CCMI can
keep up.  In cases like this, we give the user the "global default" POP,
which is usually an 800 number embedded in the PhoneDB.

When we finally put out a PhoneDB that does recognize their ANI, we will
automatically send them a new tellyscript with the appropriate POPs when
they next visit the headwaiter.  If the PhoneDB "forgets" some numbers,
possibly because an old area code split has caused some exchanges to cease
to exist, we will simply stop updating their tellyscript until the next
time they go through scriptlessd.  (The service should actually force them
back through scriptlessd once, in case their ANI changed as part of an area
code split but we never caught it.  This is currently an open bug.)

If we get the ANI, and we recognize it, but it's for an area that we don't
yet support (e.g. Puerto Rico), we don't send the user a tellyscript at
all.  Instead they just get a message saying that WebTV isn't yet supported
in their area.


What happens if we don't get their ANI, and it's a "Classic" box doing a
flash download?  Now we're in trouble: we don't have their ANI, and we
can't put up a user interface and ask because the "Classic" flash
downloader doesn't *have* a user interface.  If they're talking to
scriptlessd, they must be brain-dead, probably from an earlier failed
download.  We temporarily send them to an 800 number (the "NoANI" number),
until they can finish the download.  When the download finishes
successfully, the box will automatically go back through scriptlessd.

This has the added bonus of giving most users a more stable environment for
doing the download, because the POP they're calling is under our control.


One of the pitfalls of using ANI is that it only works when the user dials
into an 800 number.  It's very important that we know where the box is,
because if we have the wrong value for their ANI we will be handing out the
wrong set of POPs.  If one of those POPs is a 7-digit number, we could be
dialing a 7-digit number in the wrong area code, and call a MOM instead.
On the other hand, 800# calls are expensive, and we have limited capacity
on the modem racks, so we can't have the box dial into the 800 number every
time the box powers up.

The current approach for dealing with this is to assume that the box might
have moved whenever it loses power.  We display a message the first time
the box turns on after losing power that shows their phone number (e.g.
"650-614-XXXX"; the last four digits are blanked in case they return the
box to the store).  If the user has moved the box to a different phone
number, they can just hit "Moved", and the box will go back through
scriptlessd.  Versions of the box before client 1.2 weren't able to display
the ANI number in the dialog.


A practical issue that has arisen on a few occasions is when a helpful
store salesman runs the box through an initial scriptlessd connection
before the customer takes it home.  If the customer gets home and asserts
that the box hasn't moved, they will end up with a tellyscript for the
store's ANI rather than their own ANI.  Because most of the units on
shelves are client 1.0, they can't display the partial ANI in the "have you
moved" dialog.

The workaround was to put a test at the start of registration that figures
out how long it has been since the box went through scriptlessd.  If it has
been more than a certain amount of time, the box is thrown out and must
come back in through the 800 number.  In the usual (non-helpful-salesman)
case, the box will proceed to registration within a few minutes of visiting
scriptlessd, so with a suitably defined interval -- currently 15 minutes --
we can solve the problem without creating a new one.



-= II.E =-  The local, the toll, and the ugly

Figuring out what's local and what's not is far more difficult than you
might expect.  The single biggest obstacle is the lack of completely
accurate data.  What we get from CCMI is fairly accurate, but they're
collecting tariff data from dozens of companies on hundreds of calling
plans for 25,000 different exchange areas.  With that much data, in a
system as convoluted as the U.S. phone system, there's bound to be
problems, and there's an awful lot of "process" between finding a problem
and getting it fixed.

We also have trouble with missing data.  Some LCAs are entirely
unsupported, others are partially supported.  A "partially supported" LCA
is one where the data is loaded once, when somebody asks for it.  It isn't
kept up to date, and there is no pricing information associated with the
local calls.  Based on this data the PhoneDB generator can tell that a call
is local, or at least *was* local in the recent past, but can't tell how
much it costs.  This makes it impossible to distinguish between "Local" and
"Expensive Local".

The myriad filters and fancy footwork we do when generating a PhoneDB are
outside the scope of this document.  What's important is to understand how
far you can trust the data and why it might be wrong, so that you can
understand POP-O-Rama output and try to differentiate customer error from
CCMI error.


Here's an example of output from the "lookuppop" tool, which generates the
output for the POP-O-Rama web page:

For 561-357-0000 from W PALM BCH, FL (base cost=0):
  cnc/561-227-0012 in or near "West Palm Beach, FL" (W PALM BCH, FL)
    LOCAL 0.0mi  [wc=7.6mi] cost=0 
      --> 227-0012 then 1-561-227-0012
  uunet/561-681-9557 in or near "West Palm Beach, FL" (W PALM BCH, FL)
    LOCAL 0.0mi  [wc=5.4mi] cost=0 
      --> 681-9557 then 1-561-681-9557
  cnc/561-226-0010 in or near "Boca Raton, FL" (BOCA RATON, FL)
    ExpLocal 23.7mi  [wc=19.0mi] cost=1840 
      --> 226-0010 then 1-561-226-0010
  uunet/561-368-8801 in or near "Boca Raton, FL" (BOCA RATON, FL)
    ExpLocal 23.7mi  [wc=19.0mi] cost=1840 
      --> 368-8801 then 1-561-368-8801
  psi/954-971-5720 in or near "Pompano Beach, FL" (POMPANOBCH, FL)
    toll* 31.9mi  [wc=26.6mi] cost=2927 
      --> 1-954-971-5720
  uunet/954-486-4806 in or near "Fort Lauderdale, FL" (FTLAUDERDL, FL)
    toll 39.9mi  [wc=31.9mi] cost=2927 
      --> 1-954-486-4806
  cnc/954-845-0336 in or near "Ft. Lauderdale, FL" (FTLAUDERDL, FL)
    toll 39.9mi  [wc=36.4mi] cost=2927 
      --> 1-954-845-0336
  cnc/305-651-1819 in or near "Miami, FL" (NORTH DADE, FL)
    toll 53.5mi  [wc=46.8mi] cost=2927 
      --> 1-305-651-1819

The first line identifies the exchange where the caller is.  In this case,
I asked for "561-357", and it filled in the last four digits with zeros
(remember, you only need the NPA and NXX to identify the location).  The
location name is "W PALM BCH, FL".  The names are cryptic because the CCMI
database only has space for 10 characters, and they're all upper case.
"FL" is the state, in this case Florida.  "Base cost" is what we computed
it would cost for somebody in the 561-357 NPA/NXX to call themselves, based
on a call of a certain duration at a certain time of day.  DO NOT tell this
cost to a customer!  It might be based on a calling plan other than what
the customer has, and we don't want to be responsible for giving out cost
figures that are based on inappropriate or possibly even inaccurate data.

After the first line are eight sets of three lines, with one line for each
POP.  The first line in each set identifies the POP.  "cnc/561-227-0012"
means it's a Concentric Networks POP at 561-227-0012.  There are two city
names, "West Palm Beach" and "W PALM BCH".  The latter is supplied by
CCMI.  The former is sent to us by the IAP, can be edited fairly easily,
and is displayed to the customer in the "have you moved" dialog.  The names
don't always match up; note that the last entry says "Miami" and "NORTH
DADE".  This is generally because the CCMI entry describes things from the
telco perspective.  For example, the Pacific Bell phone book describes
Cupertino as being in "San Jose 2", and CCMI shows Cupertino numbers as
being in "SAN JOSE W".  Ditto for Menlo Park, which appears to be in PALO
ALTO.  In general, the "nice" name is more accurate.  If you believe the
two are totally out of whack, ask the SOC to look into it.

There is no "nice" name on the top line, because (1) we only have "nice"
names for places where the POPs are, and (2) the NPA/NXX isn't enough to
tell you what city the person lives in.  Some NPA/NXXs cover more than one
city.

The next line tells you about what it costs for a user at the NPA/NXX to
call that POP.  The first word is one of the following:

  LOCAL - we believe the call is local, and that the cost of the call is
    the same as if the user called themselves.
  ExpLocal - CCMI says it's a local call, but it's more expensive to call
    than other local calls.  Zone 3 calls in California are ExpLocal.
  PsuedoLocal - equivalent to ExpLocal in almost every respect.  Explained
    below.
  toll - this is a toll call.  It might be a "local toll" handled by the
    LEC or a long-distance call handled by an IXC.

(In the ancient days of yore, there was a distinction between "LOCAL" and
"local".  The LocalMustEqualCostToSelf feature removed this distinction.)

Regardless of how the calls price out, local calls always come before
ExpLocal, and ExpLocal calls always come before toll.  Toll calls that are
cheaper than local calls are extremely rare, so we always prefer the local
calls just in case there's an error in the tariff data.

Entries with an asterisk (i.e. "toll*") denote a certain kind of IAP.  This
is explained later.  Usually you should just ignore the asterisk.

The number after the local/toll indication is the distance in miles between
the rate center for the caller and the rate center for the POP, using the
"minor" (a/k/a "under 40") MTS coordinates.  Put more simply, it's how far
apart the phone company thinks the two points are.  Calls aren't usually
local beyond 10 or 15 miles, but there's one case in Florida where you
could make a 135-mile local call for $0.25 per call.

The next number in square brackets is the distance between the wire centers
for the caller and the POP.  In some situations the wire center distance is
used when pricing local calls.  As you can see in the example above, the
MTS coordinate distances are both 0.0, but the wire center distances are
slightly different.  Usually the numbers are pretty close, but because of
the way some POPs are connected to the phone system, the wc numbers can be
large (perhaps 20 miles).  When tracking down problems, it's usually best
to pay attention to the first number (the MTS coordinate) and ignore the wc
coordinate.

The final item on the line is the cost of a call made for a given duration
at a specific time of day on a particular day of week with a certain
calling plan.  Sometimes we average rates from multiple carriers together,
which complicates things.  At any rate (no pun intended), it's the most
important value we use when deciding the order in which to hand out POPs.

The last line of the output shows the dialing patterns that we will try, in
the order that we will try them.  For the first entry we will try 7-digit
dialing and then 11-digit dialing (it's a home/local call); for the last
entry we just try 11-digit (it's foreign/toll).


Occasionally you will see entries that look like this:

For 205-526-0000 from LEESBURG, AL (base cost=241):
  tdsnet/205-927-6200 in or near "Centre, AL" (CENTRE, AL)
    PsuedoLocal 5.1mi  [wc=5.1mi] cost=2040  [LCA not sup]
      --> 927-6200 then 1-205-927-6200
  tdsnet/205-528-6200 in or near "Crossville, AL" (CROSSVILLE, AL)
    toll 14.5mi  [wc=14.5mi] cost=3137  [LCA not sup]
      --> 1-205-528-6200 then 528-6200

The end of the second line in each set may have a special code in square
brackets.  The most popular ones are "unsupported local" and "LCA not
sup".  When you see "unsupported local", it means that we have the LCA
(Local Calling Area) definition, but no rate information (this is the
"partially supported" LCA data mentioned earlier).  Chances are the LCA is
not getting updated regularly, but since these LCAs are usually small rural
areas, it probably doesn't *need* to get updated very often.

When you see "LCA not sup" it means we have no information at all about the
LCA for this area.  We just plain can't tell what calls are local, and have
to punt.

Well, that's not *entirely* true.  If the caller and POP are in the same
exchange area, we go ahead and assume that it's a local call.  We also have
a feature where we declare that everything within a specific radius
(currently 10 miles) of the caller in an "LCA not sup" area is local.
Since we can't determine the cost, we define them to be ExpLocal.  To make
the distinction clear, we display ExpLocal calls in "LCA not sup" areas as
"PseudoLocal".  As mentioned above, PseudoLocal is functionally equivalent
to ExpLocal; we just show it differently because the definition of "local"
is based purely on MTS distance rather than telco tariffs, and therefore is
more prone to problems.

The motivation for doing PseudoLocal was that ExpLocal calls are always
prioritized ahead of toll calls.  Because of weirdnesses in the phone
system, it may cost more to call yourself with AT&T than it would to call
the other side of the country.  Without PseudoLocal, people in some rural
areas -- who most likely had local POPs nearby -- were being told to dial
distant locations, because an AT&T call cost less, and the only rating
information we had was for the IXCs.  (You might be tempted to just do the
POP assignments by distance rather than cost, but there are many areas
where distance and cost don't correlate.  Some 50-mile calls in Florida
are more expensive than 300-mile calls into a different state.)

There's a problem with doing this though.  Suppose we're in an area where
local calls that cross area code boundaries (FL) require 7-digit dialing.
Suppose further that we're in an unsupported LCA.  We're now in the
uncomfortable position of telling the box to use 7-digit dialing across
area codes, based solely on the fact that the POP is less than 10 miles
from the caller.  Fortunately it's easy to manually verify that we're not
doing bad assignments; just dial the 7-digit POP number, using the
*caller's* area code.  If you get something other than a recording, we're
in a lot of trouble.  (Turning off UnsupLCADistOnlyRadius fixes it, but
then we lose PseudoLocal, which will make us rather unpopular with some
customers.)

Ideally we would be able to add our own LCA definitions to the CCMI data,
and avoid the problems entirely.  Of 25,000 or so exchange areas, 5,000 are
completely unsupported.  Maintaining a complete set of data for areas with
a tiny handful of people isn't cost-effective, for us or CCMI, but it would
be nice if we could fix the areas where we do have some customers.


A more insidious problem has occurred in a few places, notably parts of
Texas (Grand Prarie, anyone?).  In these cases, CCMI had only one local
calling plan in the database, and it was an extended-area "metro" plan that
not all of our customers had signed up for.  The data that we got out of
CCMI showed certain POPs as being free local calls, and sure enough, they
were for everybody who had signed up for the extended plan.  The rest of
the people were a trifle irked.

The PhoneDB generation process scans the entire set of local calling plans,
and always uses the most restrictive definition.  When a wide-area plan
is the most restrictive definition of an LCA, we're in trouble.

This sort of problem is difficult to deal with, because in these situations
the CCMI data *is* accurate.  It just happens to be incomplete.  In this
particular case I asked them to add the standard calling plan, and they
said they would look into it.  This is another scenario where being able to
tweak the local calling plan definitions would be useful.  We can do a
limited amount of fixing with the "ChangeCallCost" PhoneDB feature, but
that's clumsy at best.


There are some other odd things you might see in POP-O-Rama output, like:

For 604-523-0000 from NWESTMNSTR, BC (base cost=??):
  uunetdan/360-383-1000 in or near "Bellingham, WA" (FERNDALE, WA)
    toll?? 29.4mi  [wc=0.0mi] cost=??  [origin not in DB]
      --> 1-360-383-1000

"Origin not in DB" happens because the point of origin is in Canada, and we
don't currently have data from CCMI for calls made from Canada.  Note that
"base cost" is "??", which means that we weren't able to figure out what it
would cost for someone in 604-523 to call themselves.

For 817-278-0000 from EULESS, TX (base cost=0):
  cnc/972-375-0501 in or near "Dallas, TX" (GRAND PRAR, TX)
    ExpLocal 8.9mi  [wc=8.2mi] cost=242  [hacked!]
      --> 1-972-375-0501 then 972-375-0501

You will see "hacked!" when the kind of call and cost of the call have been
explicitly changed by the person generating the PhoneDB.  (There's probably
a better word to use than "hacked".)


All of our local cost calculations are actually based on business rate
plans.  There are residential rate plans available in the CCMI database,
but very few of CCMI's customers actually use them, so they're not as
carefully scrutinized.  A comparison of residential vs. business rates done
early in 1997 suggested that, while some areas were more accurately rated
using the residential data, other areas seemed wildly inaccurate.  The
decision was made to avoid residential rate data for now.


If you find yourself answering a phone call or an e-mail message from a
customer who claims that a POP isn't local even though we think it is,
don't jump to any conclusions without some corroborating evidence.  I
received a handful of bug reports saying that 510-742-xxxx (in Fremont)
wasn't local from Palo Alto, even though the pages in the front of the
Pacific Bell white pages showed that it was.  People in areas with low
population densities will often assume that exchanges they don't recognize
aren't local.  (This problem has returned, too: now people in 510 don't
realize that they can dial into the northern part of San Jose.  Sigh.)

Of course, it would be a bad idea to dismiss such claims out of hand.  The
best evidence is a phone bill that shows the POP as being non-local.  There
have been several cases where the phone company mis-billed a call, either
because of 11-digit dial patterns or errors on their part; with the bill in
hand we can easily get either the telco or CCMI to straighten out their
data.  If they haven't yet received a bill, a call to the business office
or even an operator at the telco that handles the call will resolve the
matter, but there have been cases where conflicting answers have come from
the same source on subsequent calls.  Also, be sure that you're talking to
the right LEC, because different carriers will have different calling
plans.

Local vs toll issues should be reported to the SOC.  If you're the one
investigating a complaint, and we don't have a phone bill to look at, you
should talk to the operator about the calls in question and ask whether
they are (1) local, (2) local but expensive (e.g. zone 3 calling), (3)
local toll, or (4) long distance.  Most operators will just say "local" for
#1 and "toll" for #2, #3, and #4 to avoid confusing the customer, but the
distinction is important for us.



-= II.F =-  POP, phone line, and network quality issues

Not all POPs are created equal.  WebTV requires that all POPs we use are
capable of 28.8Kbps communication, and we take steps to ensure that there
is adequate network capacity between our IAPs and us.  Even so, there are
cases where an individual POP or individual user will see substandard
performance.  This section provides a quick overview of symptoms and their
causes.

The most common problems are in the user's house or apartment.  Line
splitters, large numbers of phones on the same line, phone extenders that
plug into an A/C power outlet (commonly used with DSS systems), and old
wiring are common sources of problems.  They can interfere with the phone
line, resulting in slow connections.

The initial connect rate shown on the tricks-info page and in the phone
logs doesn't tell the whole story.  One of the features of modern modems is
that they will "negotiate down", or start talking more slowly, if a lot of
errors are detected.  This is done because the modems are less susceptible
to disruption at lower speeds.  If the line conditions improve, the modem
will negotiate back up.  Unfortunately, we have no way to monitor the
current speed or know the lowest speed used, so it's difficult to identify
problems just by looking at the initial connect rate.

Even so, if you see connections being established at 21600bps or lower,
there's a good chance that the user's phone connection is poor.  If many
users are reporting similar troubles with that POP, and you connect at a
slow rate when calling the same POP from here (you can do this with
Vend-A-Telly, described in a later section), there's a chance that the POP
itself is poorly connected.

Most phone companies won't guarantee connect rates of 28.8Kbps or higher.
Pacific Bell only guarantees 4800bps, which is pretty pathetic.

The box will refuse to connect at less than 14.4Kbps, but could conceivably
negotiate lower.  It may be possible to disable downward negotiation below
14.4, but it's not clear that this is always desirable.


In the very early days, before the service went public, we displayed the
connect rate right below the WebTV logo that you see before you get to the
home page.  The information was removed to avoid being swamped with calls
from customers wondering why they weren't getting the full 33.6Kbps
connections that they paid for.  The reality is that not all IAPs have POPs
that go above 28.8Kbps, and even then, most 28.8, 33.6, and 56K modem users
don't get the speed they would hope for (26.4, 31.2, and 42K are much more
common) because of noisy phone lines or other external factors.  The
reviewers of some 56K modems were unable to get actual data rates above 44K
with even the best of modems.  The worst couldn't break 30K.

When LECs won't even guarantee 14.4Kbps, it's impossible for WebTV to
guarantee anything higher.  We should make every effort to determine the
cause of poor performance, but some things are beyond our control.  If the
user has a PC with a modem that has no trouble connecting, try to get the
WebTV box configured as close to what the PC does as possible, or ask the
user to have the PC call the POP that the WebTV box is calling.  They don't
need to log in, just call the POP and watch the connect rate.


There's more to POP quality than just modem connect speed.  Everything that
the box receives has to be sent from our servers, across either the
Internet or a private network connection to the IAP, from the IAP to the
terminal server at the POP, then out through the modem and down to the
user's box.  The modem speed is a good place to start, but it's also
important to consider the network performance.

It's difficult to get a simple performance number out of the network
connections, because they may hit peaks where traffic grinds to a crawl for
short periods, may exhibit spasmodic behavior with bursts of activity
followed by long periods of silence, or may just move at a steady snail's
pace.  The easiest way to check the performance is to try to download a
large image file (say a 150K GIF or JPEG) and see how long it takes to
arrive.  This feature is also provided by Vend-A-Telly.


An issue related to POP performance is line drops.  There are a number of
reasons why the box might suddenly disconnect from the service, some of
which are discussed in a later section on "idle timeouts".  Disabling or
reducing the sensitivity of call waiting in the Dialing Options screen
resolves most problems with unexpected disconnects.

The cause of some of our troubles with call waiting is that the box doesn't
detect the call waiting "bong" accurately.  Any substantial disruption,
including somebody picking up an extension phone or a random burst of noise
on the line, will be interpreted as an incoming call.  Adjusting the
sensitivity setting will reduce false-positives and missed calls, but for
many customers the system is not 100% reliable, and never will be with the
modems built into WebTV "Classic" boxes.  It appears that "Plus" boxes will
be similarly unreliable.

Some line drops don't go away with the call waiting setting.  There have
been cases where the IAP's modems dropped the connection when a significant
amount of line noise was detected, regardless of the setting on the WebTV
box.  This can usually be corrected by the IAP.


More information on diagnosing and correcting the above should be available
from Customer Care.  This document is long enough without having a complete
troubleshooting guide in it as well.



  |                               |
-=*=-  III. Service Mechanisms  -=*=-
  |                               |


-= III.A =-  Dial overrides, Satan, and you

Dial overrides are a quick and easy way to send somebody to a particular
number with a specific dial pattern.  Unfortunately they're a little too
easy.  They can solve a problem (or at least placate a customer) quickly,
but they don't go away when the underlying problem gets solved.  In general
dial overrides are a Bad Thing, and alternate solutions should be used
whenever possible.

In the early days of the service, there was no such thing as a dial
override.  Because there was no quick solution, the problems were fixed in
other ways, or were analyzed until it was determined that the problem was
unrelated to the POP number being dialed.  This was time-consuming but very
effective at identifying the root cause of problems.

The issue that drove the existence of dial overrides was that some
customers bought special calling plans through their phone company that
allowed them to call a specific region or number for a flat rate per
month.  If the PhoneDB got updated, and their primary number changed,
they would no longer be dialing the preferred number.  We needed a way
to send people to a specific area.

The initial solution wasn't pretty, but it was the best that could be done
with the available facilities: the user's ANI of record was changed to an
NPA/NXX that had the target POP as the primary.  Since there were only two
IAPs (cnc and uunet), and load balancing was a distant dream, this worked
fairly well.  Unless, of course, the box lost power, and the user said
"yes, I've moved".

Clearly we needed something else.  The first version of dial overrides was
added a few hours after a service release had frozen, because by consensus
it had been placed on the C-grade "would be nice" list, and wasn't really
supposed to be done at all.  Consequently it was done in a big hurry.  The
database stored one override that had an ANI, a provider name, and the
exact string of digits needed for dialing the POP.  If the ANI matched, we
sent a tellyscript for that POP and provider, complete with a warning
dialog.  This mechanism quickly became popular, and eventually support for
it was added to the CMR tool.

With a little experience it became clear that the mechanism was
insufficient.  You couldn't put in an override for a box behind a store's
PBX, because the ANI value might be different each time the box logged in.
You couldn't override to an 800 number because the warning dialog would
show the 800 number (this is a bad thing, as explained in a later
section).  The override didn't go away if the POP went away.  And you
couldn't have the override go dormant if the user moved to an area with
local coverage.

The second generation of dial overrides provided for these, mostly.  It was
again done at the last minute and at a low priority.  Nearly a year later
the CMR tool still couldn't (and even now can't?) parse the new format, and
some of the features -- like disabling the override when the POP goes away
-- weren't implemented.  The only way to do the new-style overrides is with
"clientpopedit" (the first version of which, incidentally, was a truly
frightening piece of work).


There are things that can be done to make overrides less harmful.  The
trouble with them is that it will require CMR changes to make them
accessible to Customer Care.

High on the SOC's request list are "negative overrides", where you get to
specify a number (or perhaps a complete exchange area) that the user says
they don't want to be calling to.  You can remove the POPs that the user
doesn't like, and leave all the rest in.  Another desirable item are
overrides with expiration dates, for cases where a POP is temporarily out
of commission, and the user is screaming because they're too impatient to
wait for it to give up and try the next number.

One interesting "feature" of overrides is that they are bound to a box, not
to a subscriber.  If a user swaps a box because of defects and has their
account moved over, the dial override doesn't move with them.  This isn't
necessarily a bad thing, because the dial override might have been entered
as part of diagnosing a problematic box.  When the old box is
"unregistered" prior to adding a new account, the dial override is
purged automatically.


Whatever fancy features get added to overrides, the rule of thumb remains:
don't use them unless you absolutely need to.  And the only valid reason
for needing to are for users with specific calling plans that we can't take
into account otherwise.

Some common abuses of dial overrides are:

 - Dial pattern fixes.  Use "dpedit" for this.  Edit the patterns, then
   if they can't get in at all, tell them to unplug and say they've moved
   so they'll go back through scriptlessd.
 - Dead POP workarounds.  Tell them to be patient, we're working on it.
   There is support in the service for temporarily removing a POP from
   everybody's tellyscripts, but it's too clumsy to use at present (the
   DisablePOP config option).
 - Slow POP workarounds.  This is harder, because the POP is connecting
   but is performing poorly.  A simple technique is to turn audible dialing
   on, then unplug the phone after the first dialing sequence completes.
   When it gives up it'll try the second number (unless they only have one
   local call, in which case it tries the first number twice).  It's a
   pain, but it works.  If they insist on getting a fix, give them the
   override but leave the trouble ticket open.  Remove the override a few
   days later when things are better and close the ticket then.
 - PhoneDB local vs. toll problems.  Using a dial override to fix these
   *temporarily* is okay, but the ticket should be left open as long as the
   override is in place.  The problem is not solved until the PhoneDB is
   correct.  When the PhoneDB is fixed, the override gets removed, and only
   then is the ticket closed.

Like the saying says, "if you don't have time to do it right, when will you
have time to do it over?"  Every dial override that gets added also has to
get removed, because sooner or later that POP will go away or more local
numbers will be added or whatever.  If everybody gets overridden to POP #2
when POP #1 gets congested, the load balancing algorithms can't do their
work, and pretty soon POP #2 is going to be congested and all those people
are going to be calling you all over again.

Customers that can get a quick fix by calling Customer Care will do so
every time their POP gets slow.  Don't encourage people to call up every
time they have the slightest problem.

Avoid quick fixes that just postpone the inevitable.



-= III.B =-  Introduction to tellyscripts

A tellyscript is a C-like program that is interpreted by the box.  Their
most important and most obvious function is to tell the box what numbers to
dial, but they do a lot of other work besides.

Most communication software use what are known as "send/expect" scripts.
Send/expect scripts send a particular string, and then expect a certain
response.  The MacPPP configuration is a simple example: generally you send
a dial string, expect the word "Login:", send your user name, expect
"Password:", and then send your password.  The fancier versions will allow
you to expect one of several different responses, and perform different
actions based on what you get back.

Andy Rubin thought this was a little simple-minded, so he combined the
send/expect concept with a minimal C interpreter, and named the result
after a product from his former company (General Magic).  The result was a
program that could do all the usual sending and expecting, but with the
flexibility of C code.

The current batch of tellyscripts will:

 - Initialize the modem.  All of the phone settings in the user interface,
   including things like dial speed and call waiting sensitivity, are
   put into practice by the tellyscript.
 - Update the message on the progress bar in an appropriate language while
   the box is connecting.
 - Send the appropriate login and password to one or more of several
   different ISPs (including OpenISP ISPs).
 - Parse all modem result codes, and convert them into connect rate and
   protocol values for display by the box (like on the tricks-info page).
 - Do some really funky things involving NVRAM and phone settings.
 - Combine dial prefixes, including the special "only for long-distance
   calls" prefix on the Obscure Dialing Options page.
 - Work around bugs in certain versions of the modem firmware.
 - Deal with several different failure modes, and return appropriate
   error status codes.
 - Post "this may be a toll call" alert dialogs.
 - Dial POPs several times and in different orders, moving on to the next
   when one fails.
 - With POPtimization, use one of up to eight different *sets* of POPs
   based on day of week, time of day, and what month it is.
 - Set the primary and secondary name servers that the box uses when in
   proxy-less mode.
 - Send and expect.

Each tellyscript is divided into four sections.  The pieces are combined
on the service, and the full script is then tokenized and compressed before
being sent to the client.  On disk, the files are named ".tsf", which stands
for TellyScript Fragment.  The four sections are:

 base.tsf - common functions.
 locale.tsf - country-specific features (e.g. Japanese connect messages).
 <iap>.tsf - one or more tellyscript fragments, one per IAP.  These are
   named after the IAP, so CNC's .tsf file would be called cnc.tsf.  These
   are very short; usually they just have the IAP's Radius login info.
 <generated> - tellyscript code generated on the fly.  This is where the
   actual phone numbers and "this may be a toll call" warnings go.

The combined size of the four sections is about 40K when in C code form.
This boils down to about 12K when tokenized, and 5K when compressed.


When the service sends a script down, it saves a blob of information in the
service that looks like this (line broken in half for readability):

    0x34567117-0x4abf9aa7-base:36:-|locale:2:-|__wpb:1:3261095|__cnc:2:6870610|
      __wpb:1:3261095|__cnc:2:16506870610|__artemis:1:18006108918

Translated into human-readable form, it looks like this:

    Hash 0x4abf9aa7, sent Tue Oct 28 15:02:17 1997
    v36 base/-
    v2  locale/-
    v1  wpb/3261095
    v2  cnc/6870610
    v1  wpb/3261095
    v2  cnc/16506870610
    v1  artemis/18006108918

The "vN" part tells you what version of the script was sent down.  We gave
the user version 36 of base.tsf, version 2 of locale.tsf and cnc.tsf, and
version 1 of cnc.tsf and artemis.tsf.  The "sent Tue Oct ..." part tells
you when the script was sent down, and the numbers after the providers'
names show you the exact string of digits that the box is going to dial.

(In the example, the user has the wpb/650-326-1095 and cnc/650-687-0610
POPs.  He will use 7-digit dialing on both wpb attempts, but will try
7-digit dialing on the first cnc attempt and 11 on the second.  This user
has apparently established a 7-digit dialing for the wpb POP, but hasn't
yet determined the pattern to use for the cnc POP.)

If the user were given a toll warning message, the first line for the
provider would look something like this:

    v2  wpb/3261095 {toll warning sent}

and "__wpb:1:3261095" would be "_W_wpb:1:3261095" (with a 'W' up front).

The "Hash 0x4abf9aa7" part is the key to getting tellyscripts updated.
This number is a (hopefully unique) representation of the big blob.  It's
sent down to the box with the tellyscript and handed back up on every
connection.  When the box reaches the headwaiter, we recompute the
tellyscript that they should have, and compare the new hash value with the
box's hash value.  If any part of the blob changes, the new "hash" value
will be different, and we know that they need a new script.

This means that if a provider or dial pattern changes, a tellyscript
fragment gets updated, or a toll warning dialog is added or removed, the
service will automatically send the box a new tellyscript.  Since the box
tells the service what it has, there's no risk of the service thinking that
the box has a different tellyscript than it actually has.  (Which,
incidentally, is a real problem, because the box doesn't save the
tellyscript into NVRAM until the box is powered off with the remote control
or keyboard.  If the box crashes or loses A/C power before the tellyscript
is written, or the user hits the reset button on a "Classic" box, the
previous tellyscript will be used on the next connect.  For this reason,
the service tracks the *two* most recently sent tellyscripts.)

Most people don't need to understand the above in detail.  Either trust
that the system works, or read the above until you're convinced (one way
or the other).



-= III.C =-  Call ordering and the fallback number

Once we've chosen the POPs and checked the available dial patterns, we have
to dial the phone.  We know which POP to try first, but should we do the
first POP twice in a row and then do the second, or alternate between the
first and second?  What if we have one POP or three POPs?

The call ordering depends on how many POPs they have and what kind of call
each is.  In every script, we bail out when we connect successfully or if
we are unable to detect dialtone before dialing.  "Black holes", where we
connect successfully but then are unable to talk to the WebTV service, are
handled specially (explained later).

If we only have one POP:

 1. try number
 2. IF we have a secondary dial pattern, try it; otherwise skip this step
 3. retry number
 4. call 800 fallback

If both POPs have the same cost (i.e. both are LOCAL, or both are ExpLocal
or toll but have the same estimated cost):

 1. try pop#1
 2. try pop#2
 3. retry pop#1, using secondary dial pattern if it exists
 4. retry pop#2, using secondary dial pattern if it exists
 5. call 800 fallback

If one POP is more expensive than the other (perhaps one local and one
toll):

 1. try pop#1
 2. retry pop#1, using secondary dial pattern if it exists
 3. try pop#2
 4. IF we have a secondary dial pattern for pop#2, try it; otherwise skip
 5. call 800 fallback

In no case do we try more than 5 numbers, and we don't try a more expensive
number more than once unless we're trying to figure out what the correct
dialing pattern is.  The service doesn't yet support three POPs, so the
call ordering for that situation isn't shown here.

We show the toll warning dialog before the first time we call an ExpLocal
or toll POP.  The warning contains the number to be dialed and the city
name where the POP lives, using the "nice" form of the city name.


The toll-free fallback number, sometimes called "fallover" or "failover",
has been around since the early days of dialing.  The idea was to prevent
certain kinds of failures, such as POP outages or number assignment
glitches, from giving the service a bad name.

It is important to remember that nowhere in the Terms of Service does it
guarantee connectivity, and we have never promised customers that they
would have unlimited toll-free access at our expense.  The fallback number
is supported as a courtesy, and may go away or have its use restricted at
any time and without notice.


The 800 fallback number will be omitted in certain circumstances.  The most
significant one is called the "AllTollNoRoll" feature.  It was added
because some users without local POPs had, strangely, neglected to order
long distance service on their WebTV line.  Every POP number would fail,
until the box called the fallback number.  The easiest way to avoid this
situation was to leave the fallback number out of tellyscripts for users
with nothing but toll calls.

A similar situation existed for a customer with phone service that only
allowed calls to 800 numbers and 911 (Universal Lifeline Service?).  In
this case, not even local calls could be made, so despite having two local
POPs the user ended up on the fallback number every time.  The cure for
such users (besides asking them to get a real phone line) is the "disable
fallback" flag in the customer's account.  It should be possible to set
this from the CMR tool.

Of course, it's always possible for users to disrupt the dialing sequence
several times until the box dials the 800 number.  For most people this is
unnecessary and inconvenient: if they didn't have (in CCMI's and our
opinion) a local call, they wouldn't have the fallback number in their
script, so either they'll never get to the fallback number or they're
trying really hard to avoid making local calls.  We can identify such users
through usage reports, and deal with them on an individual basis as
necessary.

A recent development in the service is the 800 fallback usage cap.  This is
explained later.

Allowing calls on the fallback number to be billed at an hourly rate for
customers without local POPs has been suggested.  It may be implemented in
a future release of the service.


"Black hole" is the WebTV term for a POP that accepts modem connections but
is unable to carry network traffic between the box and service.  The
tellyscript believes it has made a successful connection, but the box is
unable to do anything after getting connected.  Early boxes (pre-client
1.1) would connect to black hole POPs and stay there until disconnected by
a timeout or an impatient user.  As of client 1.1, the box will try to
connect to the service for a minute and a half.  If it is unable to get a
response from the headwaiter in that time, it will disconnect, then restart
the tellyscript at the point where it left off.

(There was a fun bug related to black holes, where a box would get
connected successfully but not realize it.  This usually happened during
registration.  After being connected for about a minute and a half, the box
would spontaneously disconnect and redial the service.)



-= III.D =-  The "clientinfo" command

The "clientinfo" tool is a UNIX shell command.  It got its name because the
database DEVICE table entries are referred to as "Client" structures in the
service.  The tool was written to dump certain fields from the Client
structure, but it has grown beyond that.

(For those of you not up on your database lingo, the "device" entry is
linked to a physical box, and has a "subscriber" associated with it.  When
you move a user's account from one box to another, you are changing the
link to make the subscriber associated with a different device.  The device
entry is usually created as part of the manufacturing process so that we
can get the back-of-unit serial numbers into the database, but if it
doesn't exist it will be created by scriptlessd when the box first
connects.  The subscriber is always created by registerd when registration
is complete.)

There are several sections in the clientinfo output.  The first is the
PhoneDB version info:

-----
Using PhoneDB v25 USA (built Mon Sep  8 23:11:48 1997 by uid=1057)
  Features: [CCMI] [com] [ld-avg] [wlca] [zd] [lec]
  This PhoneDB is for personal services ONLY
    PhoneDB_Map()             -/-
    PhoneDB.c:146             (unknown)[15617]                  10/28 16:47:24
-----

This tells you what version the PhoneDB is, whether it was built for the US
or for a foreign country like Japan, when it was built, who built it, and
what features were enabled.  You don't usually need to worry about this,
but keep an eye out for bad dates.

Next comes the options header:

-----
--- Client info for serial '01100f7401000004' ---

  ANI ....................... 99 650-614-5539 (PALO ALTO, CA)
  Shared secret ............. 'PVKwgp8nv44='
  Script locked? ............ no
  Fallback disallowed? ...... no
  Revisit scriptlessd? ...... no
  Call Waiting Threshold .... 0 (not set)
  PSI account ............... 0
  AppROM/bootROM versions ... v3049/v2046
  Last successful connect ... 324-0657
  Category .................. normal
-----

Some of the entries are self-explanatory.  For the others: "Script locked"
indicates that scriptlessd handled the box specially for some reason, and
doesn't want the headwaiter to override the tellyscript.  "Fallback
disallowed" blocks access to the fallback number.  If "Revisit scriptlessd"
is set, the box will reboot and go back through scriptlessd the next time
it connects to the headwaiter.  "PSI account" tells you if an account has
been created with PSI for this user, which isn't something you really need
to worry about.  The "Call waiting threshold" field is currently
unsupported.

The "Category" field is a little funny.  It was added so that we could put
certain users into a specific category before they had registered.

After that we see the tellyscript description.  We saw one of these earlier:

-----
  Most recent script sent to client:
    Hash 0x7717cd29, sent Tue Oct 28 13:58:09 1997
    v36 base/-
    v2  locale/-
    v1  wpb/3261095
    v2  cnc/6870610
    v1  wpb/16503261095
    v2  cnc/16506870610
    v1  artemis/18006108918
  Previous script sent to client:
    Hash 0x6da221d6, sent Tue Oct 28 13:45:04 1997
    v36 base/-
    v2  locale/-
    v4  psi/14062473000 {toll warning sent}
    v4  psi/2473000
    v3  uunet/18013991119 {toll warning sent}
-----

After that we have the set of known dial patterns:

-----
  Established dialing patterns:
    ANI 650-463-1671 + POP 650-326-1095 --> mode=7-digit
-----

Naturally we don't have a dial override, but if we did, it would look like
this:

-----
  Dialing overrides:
    ANI          POP          Cst? Dlg? Lnk? ONL? Provider    Digits
    650-463-1671 650-326-1095  N    N    Y    N   wpb         '3261095'
-----

"ANI" should be obvious.  "POP" is the full 10-digit POP number.  "Cst?" is
set if it's a pick-yer-POP override (explained later); "Dlg?" is set if a
warning dialog should be set; "Lnk?" means the override should be linked to
the POP, and should go away if the POP goes away [currently unsupported];
and "ONL?" is set if the override should be used Only when the user has No
Local POPs [currently unsupported].  The "Provider" field says who owns the
POP, and "Digits" is the actual string of digits to use.  The output is
similar to what "clientpopedit" shows.

After this comes the load-balanced POP assignments for this user (you can
see the non-load-blanced version on the POP-O-Rama page):

-----
  POPs we would assign to this user (with load-balancing):
    psi/650-390-0900 (MOUNTAINVW, CA) 5.7mi  cost=240 (wc=3.5mi)
      (tries 390-0900 then 1-650-390-0900)  LOCAL*
    wpb/650-326-1095 (PALO ALTO, CA) 0.0mi  cost=240 (wc=2.2mi)
      (tries 326-1095 then 1-650-326-1095)  LOCAL
    cnc/650-687-0610 (PALO ALTO, CA) 0.0mi  cost=240 (wc=29.7mi)
      (tries 687-0610 then 1-650-687-0610)  LOCAL
    ziplink/650-687-2255 (PALO ALTO, CA) 0.0mi  cost=240 (wc=29.7mi)
      (tries 687-2255 then 1-650-687-2255)  LOCAL
    compuworld/415-423-0070 (REDWOOD CY, CA) 4.7mi  cost=240 (wc=7.0mi)
      (tries 1-415-423-0070)  LOCAL
    uunet/650-687-0796 (PALO ALTO, CA) 0.0mi  cost=240 (wc=29.7mi)
      (tries 687-0796 then 1-650-687-0796)  LOCAL
    ziplink/650-429-2255 (MOUNTAINVW, CA) 5.7mi  cost=240 (wc=29.7mi)
      (tries 429-2255 then 1-650-429-2255)  LOCAL
    ziplink/650-226-2255 (SANCLSBLMT, CA) 7.0mi  cost=240 (wc=9.2mi)
      (tries 226-2255 then 1-650-226-2255)  LOCAL
-----

We currently compute eight entries for every NPA/NXX.  The load balancing
is explained later.  Each pair of lines has most of the information
included in the POP-O-Rama output, but in a slightly different format.  See
the earlier section on local calling for the POP-O-Rama explanation.

After this is the POPtimization data:

-----
  POPtimized assignments:
  MONTH Oct 1997
    DAYS SMTWRFS
      TIMES 00:00 - 00:00
        POP 1 0:650-326-1095 conn=F
        POP 2 1:650-687-0610 conn=P
        POP 3 2:650-687-2255 conn=H
  MONTH Nov 1997
    DAYS SMTWRFS
      TIMES 00:00 - 00:00
        POP 1 0:650-326-1095 conn=F
        POP 2 1:650-687-0610 conn=P
        POP 3 2:650-687-2255 conn=H
-----

This is also explained later.

The nice thing about clientinfo is that it tells you what they *are*
dialing, what they *were* dialing, and would they *would be* dialing, all
in one place.  POP-O-Rama can show you the set of POPs that the service has
to choose from for a particular area, but can't tell you which ones will be
given to a specific user, because the actual assignment depends on the box
serial number.


A potentially useful option for SOC folks is the "-t" flag, which causes
clientinfo to write the tellyscript to stdout.  If you want to see what
tellyscript the user would get if they showed up right now, run "clientinfo
-t <serial-number> > script.out".  The output is tokenized but not
compressed, so it's hard to read but you should still be able to find the
phone numbers.  "strings -a script.out" may be helpful.  Note that there
are always two copies of the phone number, a 10-digit version with dashes
(e.g. 650-326-1095) and the actual number dialed with no dashes (e.g.
3261095).  If you're trying to see if a dial pattern has taken hold, be
sure you're looking at the right set of numbers.



-= III.E =-  Vend-A-Telly

Vend-A-Telly is a web page attached to the "WebTV Tricks" page in the
service.  From there you can tell your box to dial any POP from any
provider.  You can even include modem AT commands as part of the dial
string; these will override some of the features that are usually set by
the box, so use only with caution.

The page should be used whenever a POP is suspected of being flaky or
slow.  You can enter the POP number, dial in, check the connect rate, and
download a large test image to see if the network is slow.

If the POP is dead or deathly slow, DO NOT give the user a dial override
unless you leave an open trouble ticket in Remedy that will allow somebody
to remove the override when the POP gets better.  Only when the override is
removed should the matter be marked as "resolved".  Network congestion is a
fact of life; moving users between POPs will most likely just make the
problem move with the users.

Troublesome POPs should be reported to the SOC.



-= III.F =-  Visible Dialing

The current generation of WebTV boxes will display the phone number being
dialed as part of the connection progress messages.  In the early days,
because of some weird sense of paranoia, the box didn't tell you what it
was dialing.  (This same paranoia accounts for the XXXXs over the last four
digits of the phone numbers in the WebTV Phone Book on our web site.)

Version 1.1 and later clients support "visible dialing", where we show the
phone number to the user as we dial it.  It got its name because there was
concern that showing phone numbers was a user interface aberration, and
people would become greatly disturbed if the deep inner workings of the box
were revealed.  For this reason we only displayed the phone number when
"Audible Dialing" was turned on; hence the nickname "visible dialing".

As it happens, people really like knowing what the box is doing with their
phone line, and are better able to identify local/toll problems before they
get a huge phone bill.  In some cases though we want to mask the phone
number, such as when calling a toll-free number.  Here are some examples:

  visible dialing off (also v1.0 clients and "Classic" boxes doing upgrades):
    "Dialing WebTV"
  normal case:
    "Dialing 14156145539"
  normal case, with a prefix of "9":
    "Dialing 9,14156145539"
  dialing a toll-free POP (e.g. the fallback number)
    "Dialing WebTV..."
  access number "324-0657" used:
    "Dialing A/N 324-0657"

Toll-free POPs have numbers starting with "1800" or "1888".  (Yes, it's
checked before the "remove leading 1" function is handled.)  If someone
puts in an override with clientpopedit that starts with "1-800" instead of
"1800", the user is going to be able to see the number.  Appropriately
nasty warnings have been added to clientpopedit.

The call waiting disable prefix will also be shown.  If you have too many
numbers to display in the field, the end will be cut off, and "..." will be
displayed.



  |                           |
-=*=-  IV. Dialing Details  -=*=-
  |                           |


-= IV.A =-  PhoneDB details

You may have noticed when looking at POP-O-Rama that the POPs aren't always
sorted in the order you'd expect.  In a boring world we could sort by cost
and distance be walk away, but in the exciting world of WebTV we don't have
that luxury.

The first complicating factor is the amount that the provider costs us to
use.  Some providers are less expensive than others, or simply have more
capacity, and as a result are given a higher priority during PhoneDB
generation.  Some POPs from the same provider may be more expensive than
others.  This cost is sometimes referred to as a "static priority".  If two
calls have the same cost and MTS distance, we sort based on the provider
cost.


A second factor is failure containment.  If one of our major providers had
a serious network outage affecting half the country, it wouldn't be very
useful for a user to have a tellyscript with several POPs from the same
provider.  If a backbone gets backhoed, all the POPs are going to be
useless.  For this reason we try to hand out POPs from multiple providers
whenever possible.

Priority is given to leaving the primary provider in place, but the later
POPs are shuffled around freely so long as they are listed as LOCAL calls
and the provider costs us the same amount.  We try to get a mix of
different providers in the first few POPs, so that users will have numbers
from more than one IAP whenever possible.  This is known as "provider
interleaving".

Toll and ExpLocal calls aren't subject to provider interleaving.


One of the more troublesome aspects of all this POP shuffling is dealing
with providers who charge us a flat rate per user.  Every month, certain
IAPs charge us a fixed amount for each user who touches their system, even
if the user only logged in once.  If we gave a flat-rate IAP as a secondary
POP to a customer with a very good primary, and the primary failed once at
any point during a month, we would have to pay the full charge for that
user for that one call.  Clearly, we only want to give flat-rate IAPs out
as primaries.

This is where things start to get messy (it gets worse in the next
section).  Ensuring that the second POP isn't a flat-rate IAP can require
making some tough choices.  For example, suppose that the first three POPs
listed for an NPA/NXX are an hourly-rate LOCAL, a flat-rate ExpLocal, and
an hourly-rate toll.  The initial POP layout looks like this:

 1. hourly-rate LOCAL
 2. flat-rate ExpLocal
 3. hourly-rate toll

However, we can't leave the flat-rate in the second position.  We can't put
it in the primary position, because that would take the local call away,
and if we swap it with the toll call we replace a relatively inexpensive
secondary with a nasty toll one.  In cases like these, we do the latter.

Because they can cause expensive calls to move ahead of less expensive
ones, flat-rate IAPs are marked with an asterisk in POP-O-Rama output
(i.e. "LOCAL*", "ExpLocal*", or "toll*").

Flat-rate IAP assignments have an unfortunate tendency to undo POP cost
ordering, provider interleaving, and some of the load-balancing measures
described in the next section.  The problem is alleviated by "hybrid" IAPs,
which can be used as either flat-rate or hourly-rate.  For hybrid-billed
IAPs, we treat the call as flat rate if it's the primary POP, and hourly
rate if it's not.  This gives us the price savings of a flat-rate IAP with
the flexibility of an hourly-rate IAP.



-= IV.B =-  Intro to POP load balancing and provider rotation

The previous section talked about how POPs may be shuffled while the
PhoneDB is being created.  There are some further things that the service
does before sending the POPs down to the client.


In some situations our choice of POPs is limited, and we have no choice but
to give out two local POPs from a particular provider.  If we just used the
assignments straight out of POP-O-Rama, we would end up sending everybody
to the first POP, and nobody to the second (provider interleaving will in
most cases put a POP from a different IAP in the second slot, rather than
the second POP from the same IAP).  To avoid this situation we use
"provider rotation".

Provider rotation is a simple form of load balancing.  If there are two
local POPs from the same provider, it ensures that each will get no more
than 50% of the traffic.  If there are three, each gets 33%, and so on.
This is done by using the last byte of the silicon serial number to choose
between the available options.

The rotation code swaps the primary POP with one of the others.  Nothing
else is changed.  The POPs must be from the same provider, have the same
cost for us, and must be LOCAL.


One of the limitations of the data in the PhoneDB is that it operates on
entire exchange areas.  If the PhoneDB assigns wpb/650-326-1095 as the
primary POP for Palo Alto, everybody in Palo Alto will hit that POP.  In an
attempt to avoid swamping some POPs with users while ignoring others, a
simple load balancing system was implemented.  As usual, it was done at the
last minute and in a big hurry.

The basic idea is that we carve up the POP assignment pie into pieces.
Some of the providers get a piece, some don't.  Each piece can be a
different size.  The last digit of your silicon serial number (which
happens to be a checksum with a very nice distribution over the set of our
users) determines which piece you're a part of.  If the tellyscript
generator can find a LOCAL POP from that provider, it makes that your
primary POP; if not, nothing changes.

The initial implementation had one definition of the pieces for the entire
country.  Several months later, the system was enhanced to allow the pieces
to be defined in individual exchange areas, which came in handy when trying
to put Bay Area people on the "wpb" (WebTV PacBell) POP.

As you may have noticed, the system is less than perfect.  For example, if
the load balancing parameters say "50% cnc, 50% uunet", and the users in a
particular area have nothing but psi and ziplink, they won't be affected at
all.  Chances are they'll all be piled on top of the same primary POP, and
the next local POP will always be listed as the secondary.  (Yes, they'll
end up spilling over onto the secondary when the primary fills, but it's so
much nicer to not have to wait for the "all circuits are busy" timeout.)
This scheme is expected to be replaced by the POPtimization system,
described later.

As mentioned earlier, flat-rate IAPs will cause problems for us.  For
example, suppose we had three local POPs:

    PSI (flat-rate local)
    ZipLink (hourly-rate local, sort of)
    UUNET (hourly-rate local)

Suppose the load balancing algorithm says we should use UUNET as our
primary.  The POPs above would get rearranged to be UUNET, then ZipLink.
PSI wouldn't be used, because it's in the 3rd position, and we're currently
only using two POPs per tellyscript.  If, on the other hand, the initial
arrangement was:

    PSI (flat-rate local)
    UUNET (hourly-rate local)
    some hourly-rate toll number

This is difficult to rearrange, because we can't make PSI the secondary,
and we don't want to give them a toll number when they have two local
ones.

Refusing to rearrange POPs like the above could lead to situations where a
flat-rate provider receives a much heavier load in a certain area than we'd
like.  To deal with this, the configuration file allows a "tenacity"
setting to be adjusted.  The primary can be left alone, moved into the
secondary slot, or swapped with a more expensive toll call.  This decision
applies globally.  The default is to leave it alone; in the above case, PSI
would still be the primary and UUNET the secondary.

The setting also affects what happens when *all* of a user's local POPs are
flat rate.  The default behavior is to go ahead and give them the local
POPs anyway.


Here's a real-life example from an old PhoneDB:

For 510-799-0000 from HERCULSROD, CA (base cost=240):
  psi/510-848-1398 in or near "Berkeley, CA" (OAKLAND, CA)
    LOCAL* 10.1mi  [wc=10.1mi] cost=240 
      --> 848-1398 then 1-510-848-1398
  uunet/510-982-1757 in or near "Berkeley, CA" (OAKLAND, CA)
    LOCAL 10.1mi  [wc=17.1mi] cost=240 
      --> 982-1757 then 1-510-982-1757
  psi/510-254-7549 in or near "Orinda, CA" (ORINDA, CA)
    LOCAL* 10.8mi  [wc=10.1mi] cost=240 
      --> 254-7549 then 1-510-254-7549
  psi/510-688-2363 in or near "Concord, CA" (CONCORD, CA)
    ExpLocal* 13.3mi  [wc=13.3mi] cost=420 
      --> 688-2363 then 1-510-688-2363

Four POPs were found.  The 1st, 2nd, and 3rd say "LOCAL", which means that
they can be swapped in with the primary.  The 1st, 3rd, and 4th have an
asterisk after the call type, meaning that they're flat-rate and therefore
can't be put into the secondary position.  (Actually, the asterisk means
they can't be moved, and therefore they're flat-rate, but that's a detail
worth forgetting.)

This satisfies the POP interleave rules (1st and 2nd provider are from
different POPs), and the flat-rate rule (2nd provider isn't flat-rate).

If the load balancing algorithm wanted to use CNC or UUNET as the primary,
it would fail, because there's no CNC POP and there's no POP eligible for
use as a secondary if UUNET were moved into the first position.  There
is nothing the POP load balancing routines can do here.

Things are looking better for provider rotation though.  If the last byte
of the silicon serial for a user at that location was odd, the script
handed out would have the first two POPs shown above.  If the byte were
even, the tellyscript generator would use the 3rd POP as primary instead.
The fourth POP is ExpLocal, and therefore isn't eligible for rotation.



-= IV.C =-  Tellyscript return codes

After a failure that occurs while the box is connecting to the service, the
box will display a dialog with an error message.  If you hit the "Options"
key on the keyboard or remote, it will display an "M" code and an "S" code,
e.g. "M-26/S10".  The "M" code is the box's message code, and the "S" code
is the return value from the tellyscript.

The current set of tellyscript return values ("S" codes) are:

  0   ParseError - tellyscript was bad.
  1   Connecting - (not really an error)
  2   Success - tellyscript finished successfully
  3   ConfigurationError - modem and box not on speaking terms.
  4   DialingError - modem not saying what we wanted it to.
  5   NoDialtone - didn't hear a dial tone on the phone line.
  6   NoAnswer - POP number just kept ringing.
  7   Busy - POP number was busy.
  8   HandshakeFailure - modem handshake failure; this is rare.
  9   UnknownError - got an unknown result code back from the modem.
  10  BadPassword - authentication failure.
  11  PPPHandshakeFailure - couldn't negotiate PPP successfully.
  12  NoCarrier - something answered, but it wasn't a modem.
  13  BlackHole - rare; last POP was a black hole, and we ran out of POPs.
  14  VerySlowConnect - modems connected at less than 14.4Kbps.
  15  BadPasswordNR - same as #10, but we don't reboot the box.
  16  UnhappyScript - the tellyscript generator blew it.  This is bad.

When dealing with customers who are having trouble calling in, it is
important to get both the "M" codes and the "S" codes.

The "M" codes are described elsewhere.

Incidentally, the codes defined in the current (client 2.2) box sources
look like this:

  0   kTellyParseError
  1   kTellyConnecting
  2   kTellyLinkConnected
  3   kTellyConfigurationError
  4   kTellyDialingError
  5   kTellyNoDialtone
  6   kTellyNoAnswer
  7   kTellyBusy
  8   kTellyHandshakeFailure
  9   kTellyUnknownError
  10  kTellyBadPassword
  11  kTellyPPPFailed
  12  kTellyNoCarrier
  13  kTellyBlackHole
  14  kTellyDownloadOK
  15  kTellyNoLoader
  16  kTellyNoFirmware
  17  kTellyLoaderFailed
  18  kTellyNoResponseFromLoader
  19  kTellyFirmwareFailed
  20  kTellyNoResponseFromFirmware
  21  kTellyScriptExpired

The meanings of 14, 15, and 16 don't agree, which is unfortunate but not
fatal.  Because the box codes have to do with modem firmware initialization
and not dialing it's possible to tell which is which from their context.



-= IV.D =-  Dial patterns revisited

An earlier section explained that the service remembers successful dial
patterns, and uses them when generating tellyscripts.  This section
explains the mechanism in more detail.

At about the time that the splash page (the WebTV logo that comes up before
you get to the home page) is appearing on the screen, the box is talking to
a service called logserverd.  The purpose of logserverd isn't to serve
anything; rather, it collects different types of logs that are sent up by
the box, including crash logs, TCP logs, error and warning logs, TV logs,
and phone logs.  What we're interested in here are phone logs, which are
sometimes referred to as "connection logs" or occasionally "configuration
logs".

A simple phone log looks like this:

PhoneLog from 014f7c8201000055 (version=27, length=195)
  numPhoneBusy=0                    tcpInputPackets=1442
  numPhoneNoAnswer=0                tcpOutputPackets=1589
  [ ... blah blah blah we don't care about this blah blah blah ... ]
  realAudio2Used=0
  realAudio3Used=0

  Records:
    0x05 Disconnection
      when=0x3456bde6 (Tue Oct 28 20:39:02 1997)
      disconnectionType=5 "inactivity timeout"  flags=0x04
      connectWhen=0x3456bcb6 (Tue Oct 28 20:33:58 1997)
      dialString='3261095'  fullPOPNumber='650-326-1095' []
      LastConnectionSpeed=28800 LastConnectionCompression=2
      PowerOnReason=0 "normal"
    0x06 NVRAMWrite
      when=0x3456bde6 (Tue Oct 28 20:39:02 1997)
    0x01 RunScriptReport
      when=0x3456bde9 (Tue Oct 28 20:39:05 1997)
      id=0x30e859c5 modWhen=0x344ce6be [Tue Oct 21 10:30:38 1997]
    0x03 GetDialInSuccess
      when=0x3456be04 (Tue Oct 28 20:39:32 1997)
      dialString='3261095'  fullPOPNumber='650-326-1095' []
      callWaitingPrefix=''  dialOutsidePrefix=''  longDistancePrefix=''
      accessNumber=''  tollFreeAccessNumber=''
      flags=0x04 (waittone )
      dialSpeed=1  cwSensitivity=1
      dceRate=33600  dteRate=234000  protocol=0  compression=2
      totalScriptTime=1602  boxIPAddress=207.79.32.54
    PhoneLog_Log()            29677074/014f7c8201000055
    PhoneLog.c:334            logserverd[26157]                 10/28 20:41:28

Every time the box does something "interesting", it adds an entry to its
phone log.  When the box gets connected to the service, it sends the log up
to logserverd, and erases its local copy.  The service collects the logs,
which are used to generate usage reports and POP health statistics.

A complete discussion of phone logs is beyond the scope of this document.
For now we're just interested in the last entry in the log, which tells us
that the box connected successfully to the service.  (By definition, the
last entry is *always* an indication of a successful connection.  If you
weren't successfully connected, how did you post the log?)

The entry shows that the box connected to the POP at 650-326-1095 by
dialing "3261095".  When logserverd sees this, it adds an entry to the list
of dial patterns indicating that calls from the user's ANI to the POP at
650-326-1095 should be made with 7 digit dialing.

The service screens out numbers that don't correspond to POPs that might be
sent to the box.  If you put a number in the access number field or give
the box a dial override to a POP that it wouldn't normally use, the dial
pattern table will be unaffected.  There are two motivations for being so
picky: limited space, and the need to avoid garbage.  If you had to dial
"9" followed by a 10-digit number, you might be given an override or access
number like "96503240657".  If the service isn't careful, it would record
that you needed an 11-digit dial pattern to dial that number, which
wouldn't be accurate.  Rather than establish a complex set of rules for
screening out *bad* numbers, the service uses a restricted notion of the
set of *good* numbers.

Dial pattern entries are stored in "most recently used" order.  What this
means is that the most recently used dial pattern is always at the top of
the list.  The service only holds onto eight entries, so if we already have
eight and then make a new discovery, the entry at the bottom is thrown
out.  If the box logs in, and we see that the ANI, POP, and dial pattern
are already known, we just pull the entry up to the top.  If the ANI and
POP match but the dial pattern is different, we replace the dial pattern
field and then pull it up.  To make matters more complicated, we try to
reduce database accesses by not adjusting the order if the entry is already
one of the top three.


The feedback mechanism seems pretty clean on the surface, but there's
actually a race condition during login.  There's no way to be sure that the
phone log will get uploaded before the headwaiter checks to see if the box
needs a new tellyscript.  If the phone log comes up first, then the
headwaiter will compute a new tellyscript that takes into account the
latest dial pattern information.  If the phone log comes up second, the
headwaiter will make its tellyscript decisions without the benefit of
knowledge learned from the current phone log.

Of course, it's even worse than this if you're on the phone with a
customer.  It's possible for them to have the right patterns but not have a
tellyscript that includes that knowledge, because the knowledge was gained
after they got through the headwaiter.  They have to hang up, come in
again, get a new tellyscript with the new patterns, then hang up and redial
*again* to actually use the new patterns.

For these reasons, customers whose dial patterns have been edited manually
are usually told to go back through scriptlessd.  They will immediately get
a script with the latest information.



-= IV.E =-  Secret codes, NVRAM, and "have you moved?"

The "have you moved?" dialog was briefly described in the ANI section.  In
short, it appears whenever the box is unplugged.  The dialog has changed
over time, with the wording being updated with almost every client
release.  Back in v1.0 the default action (i.e. what would happen if you
just hit the "go" button without moving the selection rectangle) was "I
haven't moved", in v1.1 and later it changed to "I have moved", and in v1.2
we started showing the user's ANI as well.

When you tell the box that you've moved, all it really does is throw out
the tellyscript and the headwaiter's IP address.  When the box sees that it
doesn't have these, it heads off to scriptlessd, which gets the ANI data
and sends down a new tellyscript.

You can get similar behavior by using the "7265" secret code.  This is
related to the "7264" secret code, which has a long history of not working
right.  I don't know offhand which client versions implement the code
correctly (I'm told that *none* of the 1.x releases through 1.3 does it
right!), so unplugging the box and entering "yes, I've moved" is still the
most reliable way to wipe out the tellyscript.  Unfortunately this also
causes the clock to be reset, and in "Plus" boxes this means that the box
can't show the current TV listings until it reconnects.

The "32768" secret code wipes out all of NVRAM.  This is generally a bad
thing, because it kills some other things like screen centering and TV
configuration.  The phone log lives in NVRAM when the box is powered off,
so if a user uses 32768 he loses all phone log data collected up to that
point.  Since the information could potentially help us identify a problem
with his POP or phone line, losing it is bad.  For these reasons, 32768
should only be used as a last resort.

On internal boxes, the "93288" secret code allows you to choose which
service you want to connect to.  The box will wipe out the tellyscript to
force the box to go back through scriptlessd.  This is necessary because
scriptlessd hands out a shared secret that is used for secure
communication.  If the different services have different notions of what
the shared secret should be, the box won't be able to talk to the new
service, so we send it through scriptlessd to make sure the secret is in
sync.  IMPORTANT: the connection setup information is transient until you
actually get connected.  If you power the box off, or even go into the
dialing setup screen through the convenient button at the bottom of the
screen on some builds you will lose the information and end up connecting
to the default service for that box.

The "1-800-GoWebTV" code (actually 18004693288) clears NVRAM and then sets
a "force registration" flag.  When the service sees the flag, it sends you
back through registration so you can set up a new subscriber.  The
interactions with tellyscripts are a little funny, because unregistering
the box causes some fields in the device to get reset.  These fields are
normally initialized by scriptlessd.  Since we've already been through
scriptlessd, though, they get cleared and not set again.  In the current
service this is generally harmless, but could cause unexpected behavior.


Historical note: in the very early days, the box really did have NVRAM
(Non-Volatile RAM).  As a cost-cutting measure, we decided to remove the
NVRAM part and dedicate a small piece (about 16K for US "Classic" boxes) at
the upper end of the flash ROM for storage.  The name "NVRAM" stuck, even
though it now refers to flash ROM for "Classic" boxes and a disk block for
"Plus" boxes.



-= IV.F =-  How phone settings work

There are three prefix fields that may be applied to a dial string.  The
"Basic" screen has the "Prefix" field, "Call Waiting" has the "Block calls"
field, and "Obscure Dialing Options" (known as "Spooky Dialing Options" in
v1.1 and v1.2 clients) has the "Long-distance prefix" field.

The "block call waiting" prefix always gets sent first.  After that comes
either the prefix or the long-distance prefix, depending on which were set
and what kind of call you're making.  The following chart shows all four
combinations of prefixes, and what a local and a long distance call would
look like for each:

  prefix=(none), LD prefix=(none)
    local=6145539    long=18005551212

  prefix=9, LD prefix=(none)
    local=9,6145539  long=9,18005551212

  prefix=(none), LD prefix=8
    local=6145539    long=8,18005551212

  prefix=9, LD prefix=8
    local=9,6145539  long=8,18005551212

The determination of "local" or "long" is made by the service when the
tellyscript is generated.  POPs that are LOCAL or ExpLocal are treated as
local, and toll calls are treated as long.  The fallback number is always
considered long distance, as are numbers entered with Vend-A-Telly or
clientpopedit (if you're using the latter two methods, you shouldn't need a
long-distance prefix anyway... just enter the full set of digits you
need).  Ditto for tellyscripts handed out when "IgnoreANI" is set in the
config file (only development servers are configured this way).

For the LD prefix we regard LOCAL and ExpLocal as non-LD, but the system
works differently for the "this may be a toll call" dialog.  Local calls
don't get the dialog, but both ExpLocal and toll calls do.  The reason it's
like this is that the dial prefix is assigned based on the telco definition
of what a local call is, which often has little to do with the call being
inexpensive.

While we're here, I should mention that the "Don't dial 1 for long
distance" flag in Obscure Dialing Options doesn't really have anything to
do with making long distance calls.  If the flag is set, and you're not
using an access number, it just checks for a leading '1' on the POP number,
and removes it if found.  It has no effect on leading '1's in prefix
fields.

One final note on prefixes: most of the "Classic" boxes on store shelves
and in warehouses are v1.0 clients.  These boxes only have the "basic"
prefix, so the script behaves as if the other prefix fields exist but were
left blank.


Understanding how the other phone settings are handled isn't vital but may
come in handy.  If you need to understand precisely how something is
handled, Initialize() in base.tsf (found in the network source tree) has
the ultimate answer.

Pulse Dialing - we send a DT to the modem for tone or DP for pulse.

Call Waiting - for the US, the S10 setting determines all.  There are five
values, one meaning "off" and four meaning "on" with different sensitivity
levels.  For Japan we also set S220.

Wait For Dialtone - determines whether the modem should wait until it hears
a dialtone, or just sit there for three seconds and then go.  Set by
tweaking S6.

Audible Dialing - send M0 or M1.  The tellyscript always turns audible
dialing off when connecting with a VideoAd.

Dial Speed.  Three settings, set with S11.  We now also set &P to control
the speed of pulse dialing.  This really only applies in Japan, but the US
seems to work with the Japanese settings.  The cool thing is you can now
crank up the pulse speed if you select "fast dialing".

Access numbers are covered in the next section.



-= IV.G =-  Radius, access numbers, and PSI

When a box logs in, its tellyscript knows how to send a login and password
that the provider of the POP will accept.  All of our providers use a
system called Radius to verify login names and passwords, and all but one
uses a "proxy Radius" configuration that allows WebTV to make the actual
accept/reject decision.

The usual sequence of events during login starts with the box sending up a
login and password to the IAP's Radius server, using the PAP authentication
protocol.  Attached to the login name is a special prefix or suffix that
tells the IAP that the request is coming from a WebTV box.  The IAP's
Radius server forwards the request to our Radius server, which verifies the
login and password, and sends back an ACK ("yes") or NAK ("no") response.
By doing things this way, we retain control over which boxes are allowed
in, and avoid the hoops we had to jump through with a provider like PSI.

PSI refused to do proxy Radius, so we have to create an account with them
for every box before the box ever logs in.  This means that scriptlessd has
to connect to their system and create the account before the box can hang
up and redial.  Otherwise, if the account creation attempt failed (as it
occasionally does), we would end up giving the user a tellyscript with POPs
that they can't dial into.  If scriptlessd isn't able to contact PSI, all
PSI POPs are stripped out of the script, and the box gets whatever is
left.  A more thorough discussion of service changes and the potential
dangers involved with doing things this way can be found in the source tree
in network/src/doc/PSI.

Hybrid IAPs, which can be accessed as either flat-rate or hourly-rate
providers, have two different Radius prefixes or suffixes available.  One
prefix indicates the connection should be billed at the flat rate, the
other indicates it should be treated as an hourly rate call.


In the 1.0 client, if the last POP in your list failed with a Radius
authentication error, you would get a message that said "your box needs to
be reconfigured".  As of v1.1 the box would simply wipe its brain and
restart.  More recent service releases removed this behavior, but it may
come back depending on what sort of security mechanisms we choose to
implement.

Authentication failures on POPs other than the last in the list just cause
the tellyscript to roll on to the next POP.  It's only the last POP that
has potentially dire consequences.


We know that each IAP has a different Radius suffix or prefix.  Each may
also require a different password.  If you type a POP number into the
"Access Number" field in the Dialing Options screen, which values should it
use?

The trouble is that the tellyscript has no way of determining which IAP the
phone number in the Access Number field is associated with.  The number is
held directly on the box, not by the service, so we'd have to download the
complete set of POPs to the box to make this work smoothly.

The solution we chose to implement was to use whatever IAP happens to come
first.  If your primary POP is from CNC, then you can enter any CNC number
in the access number field and it will work.  Entering a POP number for
UUNET, ZipLink, PSI, or any of the other IAPs will fail, unless Radius is
configured in a particularly forgiving manner.  (Incidentally, this is why
we're so paranoid about showing toll-free numbers: we were using CNC's 800
number for quite a while, and the Radius authentication information was
exactly the same as CNC's regular POPs.  If you were one of the 60% of our
customers who had CNC as their primary POP at the time, you could get
toll-free access at our expense just by putting the CNC 800 number in your
Access Number field.)

Some people who do international demos have had cause to enter "cnc-palo",
"uunet-palo", or "artemis-palo" in the "Enter Your Phone Number" screen
that scriptlessd shows when it can't get ANI data.  The reason this was
added wasn't so much to allow them to use a specific POP as it was to get a
specific IAP into the first position.  If you know that a UUNET POP comes
first in your tellyscript, you know that putting a UUNET POP in the access
number field (along with whatever weird things need to be done to dial out
of a foreign country or to dial that IAP's POPs within the foreign country)
will work.

Because of the difficulty in getting the POP number matched up with the
first entry in the tellyscript, using this field is strongly discouraged
except in certain rare cases.


One place where the access number field is useful is when it's not really
used as an access number.  A special feature was added to the service to
support dialing *suffixes* via the access number field.  If the '$'
character appears in the access number field, the tellyscript will replace
it with the POP number currently being dialed.

For example, if you set your access number to "10288,$,54321", and the POP
numbers assigned by the service are 3261095 and 6145539, the box will dial
the string "10288,3261095,54321" (the commas are brief pauses), and if that
fails, it will next try "10288,6145539,54321".  (Prefixes like 10288 really
ought to go in the prefix fields rather than the access number field; I
included it here to show that the '$' can be anywhere.)  This isn't really
the intended use for the Access Number field, but since the intended use is
all but useless it was deemed acceptable.


In v1.1 and later clients, 77437 brings up the Obscure Dialing Options
page.  Here you can enter an "800 access" number that will replace the
toll-free scriptlessd number.  The scripts sent down by the service don't
even look at this field, because it only matters when you're dialing into
scriptlessd.

In v1.0 boxes, the Access Number field does double duty, and will change
the number used to dial scriptlessd.  This makes it extremely cumbersome to
use, because you have to set it to one thing while dialing scriptlessd, and
then change it to another before dialing into a POP.  The toll-free access
number field was added to help people doing international demos and other
situations where an access number was needed just for scriptlessd calls.



-= IV.H =-  OpenISP

OpenISP, which has also been known as Pick-an-ISP, BYOISP, and OpenAccess,
has been around conceptually since one of the early "connectfolk" meetings
in late 1996.  It wasn't until the first part of 1997 that it went from
being considered more trouble than it was worth to a high priority.  One of
the driving factors was competition: every competitor we had claimed to
work with arbitrary ISPs, and in fact some of our competitors used this
feature as their sole distinguishing characteristic.

The idea behind OpenISP is that you can choose to use your own ISP instead
of the ones that WebTV provides.  Any ISP that supports PPP (a standard
network protocol) and PAP (a standard method of sending up login and
password information) will work.  All you have to enter are your login
name, password, and the phone number to dial, and everything else just
works.

Surprisingly few changes were needed to implement OpenISP.  Most had to do
with presenting an appropriate user interface, and making sure that the
feature was activated and disabled when appropriate.  The login, password,
phone number, alternate phone number, and an ISP name (which isn't really
used) are all stored in NVRAM on the box, and the tellyscript pulls these
values out and uses them.  The service doesn't store these values (see
"Keeping OpenISP Closed" on the DocArchive web site listed in section VI).

Because of an early design decision that later got changed (for a while the
box was going to inject the login and password into the script; now the
script goes looking for the data), and also to keep the size of a
tellyscript small, tellyscripts are either OpenISP scripts or standard
scripts.  We don't send down a script that can either dial OpenISP or dial
standard POPs.  This may change.

You can tell if somebody most recently received an OpenISP tellyscript by
looking at the information shown by clientinfo or CMR.  It will look
something like this:

  Most recent script sent to client:
    Hash 0xdb4fc0fc, sent Tue Oct 28 13:13:03 1997
    v36 base/-
    v2  locale/-
    v2  OpenISP/-

The only IAP listed is "OpenISP".  The service doesn't know what provider
they're using or what number they're dialing, so those can't be shown.


The call ordering for OpenISP is like this if they entered one number:

  Call first number
  Retry first number

If they entered two numbers, it goes like this:

  Call first number
  Call second number

After making two calls we give up.  We never try a fallback number.  If you
want to use your own ISP, guess what, you're going to use your own ISP.

The Access Number field is ignored for OpenISP users, unless they use the
fancy kind of access number that has a '$' in it.  Dial patterns and dial
overrides have no effect on OpenISP customers.



-= IV.I =-  Client upgrades and brain-dead boxes

Client upgrades for WebTV "Plus" boxes are terribly uninteresting, because
they can do the download without disconnecting.  Also, WebTV "Plus" boxes
with damaged approm images go into the "mini-browser", which has most of
the features you'd find in a full v1.3 client.  The discussion here
concentrates on "Classic" boxes, which are far more interesting.


Client upgrades (a/k/a flash downloads) for WebTV "Classic" boxes are done
by the boot ROM, because you can't be executing code from a ROM image that
you're updating.  The boot ROM has a minimal subset of the features
available in the full ROM (it's 1/8th the size).  All it really knows how
to do is dial in, issue simple requests, and write chunks of data into
flash ROM.

The usual behavior is that flashromd tells the client to go flash itself.
The box hangs up, dials back in with the current tellyscript, reconnects to
the same flashromd, and starts asking for parts.  When it has all of the
pieces, it hangs up and reboots.  (More details than you could possibly be
interested in are available from network/src/doc/flashromd.)

Because the box is essentially a v1.0 client during downloads -- regardless
of what client version was running on the box before -- some tellyscript
gymnastics are required to get at dialing options added after v1.0, notably
the "don't dial 1" flag and OpenISP settings.  These were broken for a
while, but should work as expected now.  (See the "trouble with dial
options" document at http://webhost-1/~fadden/DocArchive/ for details.)


The term "brain-dead box" refers to a WebTV "Classic" unit with a damaged
ROM image.  The easiest way to get brain-damaged is to initiate a download
and have it interrupted before completion.  When the box restarts, it does
a checksum on the ROM, and discovers that things don't look the way it had
expected.  It boots into the boot ROM and immediately starts a flash
download.

The boot ROM ignores everything in NVRAM, because flash is corrupted and
NVRAM is held in flash.  It will accept an access number and a dial prefix,
which have to be entered with the extremely limited user interface supplied
by the boot ROM, but most of the other dial options can't be set.  You
can't use any secret codes with a brain-dead box, because the codes are
handled by the full client ROM, not the boot ROM.

Every time the brain-dead box is powered on, it connects to scriptlessd,
asks for a tellyscript and an IP address for flashromd, then disconnects
and executes the tellyscript.  After connecting to the local POP, it
initiates a download.


An interesting problem arises when an OpenISP box becomes brain dead.  We
no longer have access to the person's OpenISP login and password, because
those are kept in NVRAM, and we can no longer believe that NVRAM is valid.
We have to send them somewhere else.  But where?

The obvious choice is to send them to the POPs that they would have if they
weren't an OpenISP user, but there are a couple of problems with that.
First of all, the user might have signed up for OpenISP because they didn't
have any local POPs.  Their POPs might be toll calls, which isn't going to
make them very happy.  Second, it's possible that their primary POP is a
flat-rate IAP, which means we will have to pay for a full month of service
for this user if they only show up once to do a download.

There are two alternate solutions.  The first is to send the user to an 800
number.  This is a fairly good solution, because it doesn't cost the user
anything, and it may well cost us less than the usual primary POP.  The
down side is that it requires a large short-term increase in port capacity
on our 800 lines.  If we have a hundred thousand OpenISP users, and even a
small percentage go brain-dead, we're going to need to add a lot of modems
for a couple of weeks to handle the load.

The second solution is better but more difficult.  The box ignores the
NVRAM settings because the ROM checksum failed, and it can't trust that the
values in the NVRAM section of memory are good.  However, the tellyscript
that we send down is capable of running its own checksum on NVRAM, and
using the values there if they're valid.

This gets complicated when you consider that, until now, tellyscripts are
either OpenISP or non-OpenISP.  The second solution requires that the box
be able to dial either, and decide which it's going to do when the box
starts up.  The only good news is that the box will arrive at scriptlessd
when brain-dead, and won't store the "double" script in NVRAM, so any
wackiness in the script doesn't necessary have to affect anybody else.

We will need to move to the second solution at some point, but for now
we're just sending brain-dead OpenISP boxes to an 800 number.



-= IV.J =-  ComingSoon and friends

The "coming soon" program was, arguably, a bad idea.  What is indisputable
is that it cost an arm and a leg.  We will likely have some hangers-on for
a while yet, so it's worth explaining what it is, why we did it, and why it
went away.

In the halcyon days of WebTV's youth, we discovered that our IAPs' claims
of covering well over 90% of the country were subject to interpretation.
They weren't far off -- the actual figure was around 87% -- but that last
13% was a large and noisy bunch.

In an attempt to kick-start an increase in local coverage as we were
entering the 1996 holiday season, we were directed to institute what became
known as the "coming soon" coverage plan.  Rather than wait until we had a
signed contract with an IAP, we would provide the same coverage that the
IAP did using an 800 number.

To be eligible for "coming soon" access, you had to be in a situation where
you didn't have a local call to a "real" POP, but did have a local call to
a "coming soon" POP.  That meant you didn't have a local call, but you were
going to have one real soon.  The POP lists for the IAPs that were coming
real soon were added to the PhoneDB, and pretty soon we were letting
hundreds of people surf the net at our expense.

Getting new IAPs to sign up turned out to be a bit of an ordeal.  Some of
the IAPs we threw into the mix weren't technically competent or didn't have
(and would likely never have) the kind of capacity we needed.  Others were
unwilling or unable to configure Radius servers the way we wanted, and some
took months of negotiation before either they signed or we gave up in
frustration.  The net result was that we were paying per-minute charges for
several months.

The project, which cost several million dollars over its lifetime, was
finally killed in October 1997.  A couple hundred people still didn't have
a "real" ISP, so they were "grandfathered" in with dial overrides to a
different 800 number.


A similar but less painful situation exists in Phillips, WI and Webb, MS.
These two small towns were to be part of an advertising campaign
capitalizing on the names of the cities.  Since neither had a local ISP,
both were granted perpetual free access via an 800 number.  Nothing ever
came of the marketing plan, but we still shell out money for a box in the
library in Phillips.  In both cases, the override was done for the entire
NPA/NXX by making a special entry in the PhoneDB.



-= IV.K =-  Pick-yer-POP

The Pick-yer-POP program was a good idea that had some serious flaws.  The
basic idea was to allow the customer to choose their own POPs from a list.
They would be able to specify how many digits to dial, and change to a
different POP at will.

The most significant barrier to implementing this was flat-rate IAPs.  If a
user switched between three different flat-rate IAPs during the course of a
month, we would have had to pay 3x the fees for that one user.  A related
issue is what happens when a user chooses an hourly-rate IAP as the
primary, and then proceeds to use it for a large number of hours.  With a
flat-rate primary we would pay a fixed amount, but with an hourly primary
the costs could be much higher.

We can't afford to lose control over POP assignments unless we have some
way of making the user share the costs.  If they use a POP that costs us
more than the POP that we would have given them, we have to bill them for
the difference.  Unfortunately this is difficult to calculate, and even
more difficult to explain to the customer.

Pick-yer-POP also removes any hope of load balancing.  I would expect users
struggling to get in during peak hours to change their POP frequently,
resulting in large swings between local IAPs and lots of complaints.

The proposed implementation for Pick-yer-POP was essentially a user-driven
dial override.  Even now clientpopedit allows you to specify whether an
override is beign set for Pick-yer-POP or not.  This will likely be removed
in a future service release.



-= IV.L =-  MessageWatch and EPG

MessageWatch is the fancy name we use for a feature that allows the box to
dial in at a specified time and check for new mail.  The idea was to have
it log in during the early morning hours, so that you can see if you have
new mail without needing to log in when you wake up.

Unfortunately, a fairly large number of people configured it to log in
around 5pm, so that the mail light is set when they get home from work.
This is unfortunate because it means the boxes on the west coast are coming
in at the height of peak usage on the east coast.

Whatever the case, MessageWatch connections are vastly simplified versions
of normal connections.  A few salient facts:

 - The box only talks to the headwaiter.  It continues to accumulate phone
   log data, but doesn't send anything up to logserverd.
 - The box will retry every 30 minutes if it can't get in.
 - The box will shut itself off after 2.5 (?) minutes, no matter what.
 - If a user has one local and one toll call, only the local POP will be
   used.
 - If a user has nothing but toll calls, only the first toll POP will be
   used.

If a user is seeing multiple calls starting at a specific time and
separated by 30 minutes each on his or her phone bill, chances are
MessageWatch is involved.

WebTV "Plus" boxes do something similar with EPG (Electronic Program Guide)
data downloads.  However, in the 2.1 client, the EPG downloader won't stop
with the first POP if the second one is toll.  There are plans to fix this
for future client releases.

In "Classical" and earlier service releases, MessageWatch is only enabled
when the user turns it on.  In "Disco" and later, it may be enabled for all
new users by default.



-= IV.M =-  Idle timeouts

Idle timeouts make the box disconnect from the service and hang up the
phone when nothing has happened for a set period of time.  There are two
kinds of idle timeouts, input timeouts and network timeouts.

Input timeouts happen when the user stops using the box.  If the box
doesn't see any activity from the user, such as typing on the keyboard or
hitting buttons on the remote control, it will disconnect after 10
minutes.  This timeout is set by the service.  If the user is connected
through an 800 number (determined by comparing the box's IP address against
a list of known values), the input idle timer is reduced to 5 minutes.

Network timeouts happen when no packets are being transmitted between the
box and service.  The box used to have a network idle timeout, but this is
no longer in use.  However, some IAPs, notably CNC, have idle timeouts on
their equipment.  After 30 minutes with no network activity, CNC's terminal
servers will drop the line.

If a user is flipping through a large page, or is composing a long e-mail
message, there is no network activity.  The box won't choose to disconnect,
but the terminal server will.  If a user is experiencing line drops while
composing long e-mail messages, this is probably the cause.

Some providers have time limits that don't care whether you're idle or
not.  After an hour or two the connection is dropped, so that computer
users can't leave their machines running and wander off.  (Some computers
will just redial when disconnected anyway, but try telling that to the
IAPs.)  We haved added something similar in the form of a usage cap on the
fallback 800 number (more details later).



-= IV.N =-  Adding new providers

Adding a new provider to the system isn't something that most people will
have to do.  If done incorrectly, however, it can adversely affect a large
number of people.  This section explains the right way to do it, when it
should and shouldn't be done, and how things fail if it's done the wrong
way.

Each IAP should be a separate provider.  A provider is defined by a
"Provider:" line in a POP list in the PhoneDB.  Several attributes are
defined for each:

 - Symbol.  This is a single character that represents the provider in
   certain output formats.  The PhoneDB doesn't explicitly check this for
   uniqueness, so it may be unwise to depend on this value.  CNC's symbol
   is 'C'.
 - Abbreviation.  This uniquely identifies the provider, and is used in
   tellyscripts.  SOC is using the IAP's domain name as the basis for
   choosing abbreviations.  CNC's abbreviation is "cnc".
 - Cost (also known as "static priority").  The higher the cost, the less
   willing we are to use the POP.  This only affects PhoneDB generation;
   it has no effect on load-balancing.  The costs are relative to each
   other, and have no absolute meaning or relationship to actual dollar
   amounts.
 - Billing method.  May be "flat", "hourly", "per-port", or "flat-hybrid".
   For PhoneDB generation the only thing that matters is whether it's
   "flat" or not, but other service components (like POPtimization and
   tellyscript generation) are more discriminating.
 - Full name.  For CNC this is "Concentric Network".  This is rarely used.

All of the above are included in and available from the PhoneDB.  The
"dumppops" phonetool command will display them (see the phonetool README).

The choice of abbreviation is important, because it's used in the
tellyscript, in reports, and often in casual conversation.  It has to
follow C syntax rules for function names, which means it has to start with
a letter and may only contain letters, digits and the underscore ('_').  No
spaces, dashes, periods, or other fancy characters are allowed.  It can't
be longer than 15 characters, and by convention is entirely lower case.

More information on POP lists can be found in the rawphonetool README
(network/src/tool/rawphonetool/README).


Adding the "Provider:" line and a few POPs to a POP list is only half the
story.  The other half is adding a new .tsf file.  When tellyscripts are
generated, the service gathers up the .tsf files for every provider that
might be dialed, and combines them with several other components to form
the complete script.  The service doesn't attempt to verify that the
tellyscript fragment is correctly written, so it is imperative that the
script be error-free.

Here's the current script fragment for ZipLink (ziplink.tsf):

-----
/* TLLY ver=2 */
/*
 * This is included from "ziplink.tsf".
 */
Chat_ziplink()
{
        setwindowsize(7);
        return PAPChat("ZTV/%s", 0);
}

Chat_ziplink_2()
{
        setwindowsize(7);
        return PAPChat("ZTV/%s", 0);
}

/* --- end of ziplink.tsf --- */
-----

The "TLLY ver=2" at the top specifies the version number that you see on
the "clientinfo" output.  This should be incremented every time the script
is changed.  The first line must look EXACTLY like the one shown above, or
the service will reject the script.

There are two C-like functions, both named with the provider abbreviation.
They each call the setwindowsize() function, which sets a TCP window size
that may be different for each provider (7 works for nearly everyone), then
they call PAPChat with an argument that specifies how the Radius prefix or
suffix is to be applied.  The "%s" gets replaced with the box's login
name.  In this case, ZipLink uses a Radius prefix of "ZTV/".

There are two functions because there are two different ways to get to
ZipLink, the flat-rate way and the hourly-rate way.  This is how we support
the "flat-hybrid" billing model: the tellyscript calls the first function
for the primary POP, and the second function for later POPs.  We're not
currently taking advantage of the hourly-rate plan for ZipLink, so both
prefixes are the same.  It doesn't really hurt to have both functions when
we're not using the feature, but it does hurt if we're missing one and
try to use it, so it's best to define both and make them equivalent.

The format is simple enough, but if you have any doubts you can always run
the .tsf file through a C compiler.  (You will want to have some other
things defined if you don't want to be drowning in warnings; see
network/src/lib/tellyscript/scripts/ScriptIncl.h.)


What happens if we have a .tsf file, but no "Provider:" entry in a POP
list, and therefore no information about the provider in the PhoneDB?
Nothing.  The service will not have heard about the provider, so it won't
try to use it.  Heck, without a POP list there's nothing to use anyway.

What happens if there's an entry in the PhoneDB, but no matching .tsf
file?  Bad things.  headwaiterd will refuse to send a new tellyscript to
people would would get a script with the partially-defined provider, and
scriptlessd will actually send people off to wtv-*.  The reason scriptlessd
was written this way was to avoid sending such users to an 800 number, and
to make it immediately obvious that a serious but easily correctable
problem exists.  It would be nicer to send the users to alternative POPs
and inform a pager instead of the customer.  This may be implemented in a
future service release, especially since brain-dead boxes will report the
mysterious "couldn't get IP address" error in this situation.

What happens if there's a PhoneDB and a .tsf file, but the .tsf file
contains an error, or is missing the second function for a hybrid
provider?  Very, very bad things.  The box will probably crash when it
tries to execute the tellyscript.  For the case of the missing second
function, the failures will be intermittent, because they will only happen
when users who have the provider as a secondary fail to connect to their
primary POP.

It is *always* prudent to test new PhoneDB and .tsf combinations with the
Vend-A-Telly page before releasing them to customers.


In general, there should be a 1-to-1 mapping between providers and IAPs.
The load balancing and provider interleaving algorithms do their best to
avoid saturating a user with POPs from the same provider, but the only way
they can tell whose POPs are whose is by the provider abbreviation.  If you
split cnc into cnc1 and cnc2, there's nothing to prevent the user from
getting cnc1 as their primary provider and cnc2 as their secondary, and a
localized network outage within CNC will shut out the user.

If a provider has multiple categories of POPs, such as new POPs with higher
capacity that are meant to replace older ones, you can give higher priority
to the better POPs by assigning cost values to individual POPs.  This will
cause the PhoneDB to place them ahead of otherwise equivalent POPs from the
same provider, and will prevent the provider rotation in the service from
swapping the POPs around.

There are two cases where we've broken this rule.  The first is cnc vs
cnc800.  We used a separate provider here to make it easy to spot the users
who were on the 800 number.  This was only used for "coming soon" and other
special programs, so there was no risk of multiple CNC assignments causing
trouble.  The second case was "uunet" vs "uunetdan".  Again, it was felt
important to distinguish the two because we had radically different pricing
on them, and more importantly we wanted the load balancing parameters to
only affect the uunetdan set.  Since uunet was given a very high cost (low
priority), and uunetdan a very low cost (high priority), there was little
chance of a user ending up with one of each unless they had no other POPs
anyway.



-= IV.O =-  VideoAds

VideoAds are short (15-second) VideoFlash clips that play when the box is
powered on.  They are downloaded during a MessageWatch connect, play once,
and are then thrown away.  This feature was first added in the 1.3 client.

There are a number of restrictions on the set of users that get VideoAds.
The download takes about 5 minutes, which isn't terribly long, but if the
box is making a toll call every night it can add up.  We also want to
control our own costs by not sending the VideoAds to users with hourly-rate
POPs.  Even if the revenue from an ad impression is more than the cost of a
5-minute call on an hourly IAP, we won't come out ahead unless the user
logs in almost every day.  We only get the revenue if the ad plays, and the
ads are sent down every night whether the box plays the ad or not.

The rules are:

 - Don't send it if they're using OpenISP.
 - Don't send it if they're making an ExpLocal or toll call.  For
   MessageWatch connects, this can only happen if they have no local
   calls at all (see the section on MessageWatch and EPG).
 - Don't send it if they're on an 800# POP.  This includes "coming soon"
   POPs and the fallback 800 number.  We can determine the former, and
   the script can block the latter.
 - Don't send it if they're connected to an hourly provider.  I'm going
   to approximate this by checking the primary, on the assumption that
   we never assign hourly POPs as primary if there's a flat or per-port
   available.  (This is a bad assumption when POPtimization is in effect,
   but it'll do for now.)
 - Don't send it if they're not in the right user category.

The VideoAd plays during the first part of the box's connection to the
service.  Instead of seeing the Road to Nowhere and the connection status
bar, you watch the movie.  Audible dialing is disabled for connections that
start with a VideoAd playing.



-= IV.P -=  Automatic Number Frustration

There are cases where ANI doesn't work that weren't worth covering in the
introductory sections, but should be mentioned for completeness.

One of the unnerving things about ANI is that anybody with a T1 or PRI can
convince you that they're calling from anywhere.  Some PBX systems,
especially those targeted for use by telemarketers, explicitly allow you to
set the outbound ANI and CallerID information.  This means that sometimes
the service will receive ANI that is inadvertently or deliberately
misleading.

A prime example is Microsoft, whose Redmond campus phone system was sending
up ANI values that looked like "100-010-1180" or "100-111-5566".  Clearly
these aren't valid US phone numbers.  In this situation, the service will
put up the "enter your ANI" page.

A more insidious example is a store whose number was 804-850-xxxx.  After
an area code split, their number changed to 757-850-xxxx, but the PBX was
never updated.  When CCMI finally removed the exchange from the database,
we no longer recognized the ANI as valid, and (on the assumption that it
was a new exchange that we weren't recognizing yet) the service started
handing out tellyscripts with an 800 number.  Not only does this cost us
money, it might cost the store money in the future: if the exchange were
used for a different location in the new area code, it might be a
considerable distance away from the store, and the store would start making
toll calls because their ANI is wrong.

Another fun case was the user showing up with 415-700-xxxx.  This exchange
doesn't exist, and apparently never has.  As it happens, the caller with
this ANI was in Paris, France, and was using an international 800 code to
get to us.  For whatever reason, the carrier decided to return 415-700-xxxx
as the source.



  |                        |
-=*=-  V. Extra Goodies  -=*=-
  |                        |


-= V.A =-  OraclePhoneDB and POPtimization

Until the "Disco" service at the end of 1997, the PhoneDB had just been a
file on disk.  With Disco, the PhoneDB is also kept in the database, and in
some future release the disk file may vanish altogether.  The purpose
behind this is to gain greater flexibility and provide direct access to the
PhoneDB for database queries.

One of the more important developments associated with the OraclePhoneDB
(so named because we're using an Oracle database right now) is an optimized
POP assignment system, usually referred to as POPtimization.  The goal of
POPtimization is to assign POPs on an individual basis, rather than on an
exchange area basis.

The current load balancing system has a number of flaws, but the biggest of
them is that it doesn't consider groups of people.  When you log in, it
looks at the usage percentages assigned to the different providers, looks
at your serial number, makes an assignment, and then forgets all about
you.  It doesn't know how many ports each POP has, and even if it did, it
wouldn't know how many users have had that POP assigned to them, or which
of those users is likely to dial in during peak hours.

POPtimization takes into account customer usage patterns (like number of
hours per month and typical time of day logging in), POP capacity, and
several other factors, and assigns POPs to all users in an entire region.
This allows us to use all the capacity that is available while minimizing
costs.

The aspect of POPtimization that most directly affects Customer Care and
Operations is that tellyscripts can now hold multiple sets of POPs, and can
invoke a different set based on what time it is, what day of the week it
is, and what month it is.  Here is an example of a tellyscript assignment
with two sets, one for October 1997 and one for November 1997:

  Most recent script sent to client:
    Hash 0x91c63c79, sent Wed Oct 22 20:54:41 1997
    v42 base/-
    v2  locale/-
    v0  ---Poptimized/199710
    v1  wpb/3261095
    v2  cnc/6870610
    v1  wpb/16503261095
    v2  cnc/16506870610
    v1  artemis/18004653537
    v0  ---Poptimized/199711
    v1  wpb/3261095
    v2  cnc/6870610
    v1  wpb/16503261095
    v2  cnc/16506870610
    v1  artemis/18004653537

This example shows only two sets, but a tellyscript might have as many as
eight.  The output of clientinfo will also show the sets in a hierarchical
fashion:

  POPtimized assignments:
  MONTH Oct 1997
    DAYS SMTWRFS
      TIMES 00:00 - 00:00
        POP 1 0:650-326-1095 conn=F
        POP 2 1:650-687-0610 conn=P
        POP 3 2:650-687-2255 conn=H
  MONTH Nov 1997
    DAYS SMTWRFS
      TIMES 00:00 - 00:00
        POP 1 0:650-326-1095 conn=F
        POP 2 1:650-687-0610 conn=P
        POP 3 2:650-687-2255 conn=H

The "conn=X" part tells you if the connection is supposed to be (F)lat
rate, (P)er-port, or (H)ourly rate.  These aren't used yet, and can be
ignored for now.

Switching POP sets on calendar month boundaries is especially important
when flat-rate IAPs are used.  The IAP bills us if a call starts in a
particular month, so if we can have the box switch between two flat-rate
providers exactly on a month boundary, we won't end up paying two IAPs for
the same box in one month or the other.


The POPtimization data is determined over the set of existing users, and is
updated periodically.  New users will either get default POPtimization data
for their area, or will just get the standard load-balanced PhoneDB
selection, depending on how we implement it.  For details on how the
POPtimization is performed, contact Joy Mundy (email=joy).


There are a number of operational issues related OraclePhoneDB and
POPtimization, but it's not really appropriate to list them all here.  Some
notes are available on the http://webhost-1/~fadden/DocArchive/ site, and
Joy has a status page on http://webhost-1/~joy/Poptimization_Status.htm.

The service has a number of safeguards to prevent really bizarre behavior.
For example, every POP in every set must be one of the ones shown by
POP-O-Rama.  There is no way for corruption in the POPtimization tables to
cause a box to dial a POP that is completely wrong unless the PhoneDB
itself is damaged somehow.  The results from the OraclePhoneDB are
currently compared bit-for-bit against the results from the file version,
and the file is checksummed and verified in various ways, so PhoneDB
corruption is unlikely unless the POP lists or PhoneDB tools are screwed
up.  And if the POP lists or PhoneDB tools are broken, then we'll have
problems whether we're using the PhoneDB in Oracle or in a file.

I'm not expecting customers to be adversely affected by POPtimization.


If a box loses power, it loses track of the date and time.  In such an
event, it will behave as if it were Wednesday at 7pm (local time) in the
most recent month in the script.



-= V.B =-  Fallback usage cap

This section is rather brief, partly because the details aren't yet
finalized, and partly because I wasn't involved with its design or
implementation.  Comments or questions on this feature should be addressed
to Wiltse Carpenter (wiltse@corp.webtv.net).

The basic idea is to cut our costs by reducing the amount of time that
people spend on the fallback 800 number.  This is accomplished with two
different mechanisms, a per-session limit and a per-month limit.

The per-session limit is like the 10-minute idle timeout that the box has,
except that it forces you to hang up and redial whether you're idle or
not.  The intrusive nature of this timeout is bound to cause complaints.

The per-month limit prevents you from using the fallback number if you have
used more than a set number of hours in a calendar month.  This caps the
per-user cost at a tolerable level, while still allowing relief during
temporary POP outages.  We found that a handful of users accounted for a
large percentage of the costs, so the cap should dramatically reduce costs
while only affecting a handful of users.

When the monthly usage cap is exceeded, users dialed in through the
fallback number will get an HTML page from the headwaiter telling them that
all local POPs are unavailable.

Some regional adjustments may have to be made for areas with chronic POP
problems.



-= V.C =-  MCI

WebTV and MCI have reached an agreement that will allow WebTV customers to
switch to an MCI/WebTV co-branded service.  Such customers will use MCI
POPs exclusively (they may or may not get our fallback number), and will
pay a lower WebTV fee per month if they are also subscribed to MCI's long
distance service.

The trouble with using the MCI POPs is that they only support CHAP
authentication, while the box only supports PAP.  To use their POPs we need
to add CHAP support to the box.

In the mean time we still want to sign up customers for the co-branded
service, so for late 1997 and early 1998 we will be using the normal WebTV
POPs for MCI customers.  This will change after the next client and
service release.

We will have some troubles with flash downloads, because the flash
downloader on "Classic" boxes will always behave like a 1.0 client, and
therefore can't negotiate CHAP.  The tellyscript sent to MCI customers will
have to be able to dial either MCI POPs or normal WebTV POPs, and will
switch based on whether or not the currently executing ROM supports CHAP
authentication.

Because of the potentially large number of MCI customers who will briefly
be using our POPs, we only regard a customer as eligible for MCI if they
have a local MCI POP *and* they have two "normal" local POPs that aren't
from flat-rate IAPs.  It is possible for a customer to gain or lose
eligibility without any changes in MCI's POP list.



  |                               |
-=*=-  VI. For Further Reading  -=*=-
  |                               |


-= VI.A =-  On the web

Resources available on the web, internally or externally.

  http://webhost-1/~fadden/DocArchive/

    A collection of documents on various subjects, some related to the
    material here, some not.  Take a look sometime.

  http://webhost-1/~fadden/todo_list.html

    My to-do list.  Relevant because most of the items have some relation
    to requested features for PhoneDB generation or dialing.  If something
    hasn't been added but you think it should be, check here.

  http://hyperarchive.lcs.mit.edu/telecom-archives/

    TELECOM Digest archives.  Several years' worth of interesting articles.

  http://frodo.bruderhof.com/areacode/

    Area code split details.

  http://www.areacode-info.com/

    Assorted area code stuff.

  http://www.cnet.com/Content/Reviews/Compare/56kmodems/index.html

    Reviews of 23 56K modems.



-= VI.B =-  In the service source tree

Documents checked into the service source tree.  Consult your friendly
neighborhood tech pubs person for web versions.

  network/src/doc/DialingInfo

    This file!

  network/src/doc/ANICodes

    List of OLS codes (found in the first two digits of the ANI number).

  network/src/doc/IntlPhoneNotes

    A few notes on how the service deals with the phone systems in
    foreign countries, e.g. Japan.

  network/src/doc/POPBalancing

    A detailed technical discussion of the ramifications of POP load
    balancing, written while I was trying to convince myself that the
    system was behaving correctly.

  network/src/doc/PSI

    Description of the changes made to the service to support PSI.

  network/src/phonedb/README

    Tips and tricks for advanced "phonetool" use.

  network/src/clientpopedit/README

    Documentation for the "clientpopedit" tool.

  network/src/dpedit/README

    Documentation for the "dpedit" tool.

  network/src/tool/phonetool/README (and README_JP)

    Documentation for the "phonetool" tool, which is actually a collection
    of tools.  Of particular interest for some people is the table of
    dialing pattern codes that are output by the "dumpnpas" sub-command.

  network/src/tool/rawphonetool/README (and README_JP)

    Documentation for the "rawphonetool" tool, which is actually a
    collection of tools.  This tells you what all the nasty messages
    printed by rgenphonedb mean.

  network/src/tool/psiutil/README

    How to use psiutil, if you are ever unfortunate enough to need it.


That's all, folks...

*** WebTV Confidential ***

